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Abstract 

The  Deterministic  Versus  Stochastic  algorithm  developed  by  Martin  Casdagli  is  modi¬ 
fied  to  produce  two  new  prediction  methodologies,  each  of  which  selectively  uses  embedding 
space  nearest  neighbors.  Neighbors  which  are  considered  prediction-relevant  are  retained  for 
local  linear  prediction  (which  is  shown  a  reasonable  alternative  to  local  nonlinear  prediction), 
while  those  which  are  considered  likely  to  represent  noise  are  ignored.  The  new  algorithms 
may  in  this  sense  be  considered  to  employ  embedding  space  filtrations  of  the  time  series. 
For  many  time  series,  it  is  shown  rather  easy  to  improve  on  unhitered  local  linear  prediction 
with  one  or  both  of  the  new  algorithms.  For  other  time  series,  prediction  improvement  is 
more  difficult.  It  is  suggested  that  prediction  improvement  difficulty  is  indicative  of  stochastic 
data,  independently  of  the  direct  results  of  the  Deterministic  Versus  Stochastic  algorithm.  The 
theory  of  embedded  time  series  is  also  shown  capable  of  determining  a  reasonable  length  of 
test  sequence  sufficient  for  accurate  classification  of  moving  objects.  Sequentially  recorded 
feature  vectors  of  a  moving  object  form  a  training  trajectory  in  feature  space.  Each  of  the 
sequences  of  feature  vector  components  is  a  time  series,  and  under  certain  conditions,  each  of 
these  time  series  will  have  approximately  the  same  fractal  dimension.  Ths  embedding  theorem 
may  be  applied  to  this  fractal  dimension  to  establish  a  number  of  observations  sufficient  to 
determine  the  feature  space  trajectory  of  the  object.  It  is  argued  that  this  number  is  a  reasonable 
test  sequence  length  for  use  in  object  classification.  Experiments  with  data  corresponding  to 
five  military  vehicles  (observed  following  a  projected  Lorenz  trajectory  on  a  viewing  sphere) 
show  that  this  length  is  indeed  adequate. 
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EMBEDDED  CHAOTIC  TIME  SERIES:  APPLICATIONS 
IN  PREDICTION  AND  SPATIO-TEMPORAL 
CLASSIFICATION 


/.  Introduction 

1.1  Background 

It  is  often  easier  to  measure  and  record  data  than  it  is  to  explain  how  the  data  was 
generated.  It  is  one  thing,  for  example,  to  record  the  inflight  positions  of  an  enemy  aircraft 
but  quite  another  to  set  forth  equations  which  will  allow  accurate  prediction  of  its  future 
behavior.  Throughout  this  document,  a  time  series  will  refer  to  a  discrete  time-ordered  set 
of  real  numbers  corresponding  to  a  single  component  of  a  dynamical  system  (although  less 
restrictive  definitions  are  common  (14:3)).  In  the  case  of  a  flying  aircraft,  a  time  series  might 
be  recorded  for  the  measured  longitudinal  position  of  the  aircraft;  another  time  series  might 
represent  its  altitude. 

Some  time  series  are  predictable  with  much  greater  accuracy  than  others.  The  position 
of  a  drone  aircraft  with  locked  controls  is  far  easier  to  predict  than  the  position  of  a  piloted 
aircraft  conducting  evasive  maneuvers.  Certain  factors  lend  a  degree  of  unpredictability, 
however,  to  both  situations.  In  particular,  unknown  atmospheric  conditions  (such  as  winds 
aloft)  limit  the  long  term  predictability  of  both.  Other  factors  sharply  limit  even  the  short  term 
predictability  of  the  piloted  aircraft  (for  instance,  the  intellect  and  will  to  survive  of  the  pilot). 
Loosely,  a  chaotic  time  series  refers  to  a  time  series  generated  by  a  dynamical  system  having  a 
large  number  of  poorly  understood  controlling  influences.  Although  chaotic  is  a  relative  term, 
time  series  generated  by  the  piloted  aircraft  are  more  likely  to  be  referred  to  as  chaotic  than 
are  those  generated  by  the  drone. 
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More  precise  definitions  of  chaotic  dynamical  systems  suggest  the  often  intricate,  fractal 
structure  of  their  phase  space  portraits  (25:341).  It  wasn’t  until  the  early  1960’s  that  the  exis¬ 
tence  of  what  is  now  called  a  chaotic  dynamical  system  was  conclusively  demonstrated  (21). 
As  understanding  of  this  phenomenon  has  increased,  it  has  been  realized  that  a  sensitivity  to 
initial  conditions  is  inherent  in  chaos.  Even  given  the  equations  governing  a  chaotic  system, 
sensitivity  to  initial  conditions  destroys  long  term  predictability  in  the  sense  that,  given  an  ap- 
pmximation  to  an  object’s  phase  space  location,  the  divergence  of  nearby  trajectories  renders 
worthless  attempts  to  infer  future  nearness  from  current  nearness. 

Nevertheless,  it  is  possible  to  accurately  predict  chaotic  time  series,  sometimes  remark¬ 
ably  far  into  the  future.  Probably  the  best  current  understanding  of  how  this  is  possible  has 
its  foundations  in  what  Tim  Sauers  has  called  the  theory  of  embedology  (33).  This  will  be 
discussed  more  completely  in  Chapter  E,  but  roughly  what  it  involves  is  reducing  the  initial 
complexity  of  a  dynamical  system  to  a  few  essential  degrees  of  freedom  and  embedding  the 
given  time  series  in  a  space  whose  dimension  is  only  about  twice  that  number  of  degrees. 
An  important  theorem  published  by  Takens  (37)  in  1981  (its  generalization  by  Sauers  will  be 
called  the  embedding  theorem)  guarantees  that  the  resulting  embedding  is  a  faithful  repro¬ 
duction  of  the  original  dynamics.  Knowledge  of  the  behavior  of  portions  of  the  embedded 
time  series  near  the  portion  corresponding  to  its  end  value  may  then  be  exploited  to  form 
predictions.  Farmer  took  this  approach  in  1987  in  formulating  what  he  called  a  local  linear 
prediction  technique  (8). 

1.2  Problem  Statement 

Greater  prediction  accuracy  is  always  desirable.  The  local  linear  prediction  technique 
will  be  enhanced  by  excluding  detracting  global  influences  from  the  local  prediction  region. 
A  central  tenet  of  the  technique,  the  embedding  theorem,  will  also  be  shown  to  extend  to 
spatiotemporal  pattern  recognition. 
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1.3  Scope 

Farmer’s  basic  local  linear  approximation  methodology  has  since  been  refined  by  a 
number  of  researchers;  for  example,  (4,  33).  Most  refinements,  however,  retain  the  basic 
technique  of  using  globally  applicable  embedding  dimension  and  globally  applicable  number 
of  nearest  neighbors  to  predict  from  any  given  point  in  a  time  series.  It  might  be  imagined  that 
^plying  a  number  of  nearest  neighbors  appropriate  to  prediction  from  a  single  closest  point 
in  embedding  space  (an  extremely  local  approach)  might  offer  prediction  improvement.  This 
was  found  not  necessarily  the  case.  Improved  predictability  can  often  be  realized  by  pmning 
from  a  globally  determined  number  of  nearest  neighbors,  those  neighbors  which  are  least 
relevant  to  the  local  behavior  of  the  dynamical  system,  provided  the  system  is  not  extremely 
stochastic.  A  pruning  technique  is  introduced  which  eliminates  from  a  globally  ascertained 
number  of  nearest  neighbors,  those  neighbors  which  lie  farthest  from  the  regression  hyperplane 
that  they  determine.  It  is  further  shown  that  pruning  from  sets  of  neighbors  determined  by 
different  choices  of  delay  interval,  those  which  do  not  overlap  in  time,  can  also  enhance 
prediction  accuracy.  These  contentions  are  supported  by  examining  the  performances  of  both 
new  algorithms  against  pure  local  linear  prediction  for  six  time  series.  Ten  predictions  are 
provided  for  each  of  the  six  time  series.  Both  techniques  introduced  here  enhance  predictability 
by  reducing  noise,  in  that  they  provide  the  most  striking  benefits  for  time  series  usually 
considered  noisy,  such  as  experimental  measurements  and  financial  data.  These  approaches 
are  considered  in  Chapter  HI. 

The  existence  of  fractal  qualities  in  time  series  may  be  exploited  not  only  for  prediction 
but  also  for  classification.  Considerable  recent  research  focuses  on  the  problem  of  identify¬ 
ing  various  objects  using  not  only  the  spatial  content  of  observations,  but  also  information 
imparted  by  the  evolution  in  time  of  those  observations  (34,  20,  3,  12).  Training  sequences 
of  feature  vectors  are  obtained  from  numerous  sequentially  obtained  views  of  the  objects. 
Test  sequences  are  then  compared  to  the  training  data,  but  thus  far,  little  theoretical  justi¬ 
fication  has  been  provided  for  an  appropriate  test  sequence  length.  A  new  view  of  training 
sequences  is  here  provided.  Training  sequences  can  be  interpreted  as  projected  solution  curves 
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of  dynamical  systems.  It  is  shown  in  Chapter  IV  that  the  embedding  theorem,  applied  to  the 
fractal  dimensions  of  the  solution  curves,  yields  a  required  length  of  test  sequence  which 
experimentation  reveals  quite  adequate  for  a  classification  task. 

1.4  Outline 

The  following  chapter  presents  some  of  the  fundamental  theory  underlying  prediction 
of  chaotic  time  series.  Chapter  m  develops  two  modifications  of  the  local  linear  prediction 
method  introduced  by  Farmer  in  1 987  (8).  Chapter  IV  presents  an  application  of  the  embedding 
theorem  to  the  problem  of  moving  object  recognition.  Chapter  V  presents  some  conclusions. 
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11.  Background 


This  chapter  presents  some  of  the  fundamental  theory  underlying  prediction  of  chaotic 
time  series.  The  first  section  provides  some  concepts  from  dynamical  systems  theory.  The 
second  section  provides  an  overview  of  a  commonly  applied  technique  for  extracting  the 
correlation  dimension  from  time  series  data.  A  computer  program  which  implements  this 
algorithm  is  included  in  Appendix  C.  Sectic.  'hree  shows  how  knowledge  of  correlation 
dimension  can  be  applied  to  find  an  appropriate  embedding  dimension  for  a  time  series. 

2. 1  Basics  of  Dynamical  Systems  Theory 

Throughout  this  document,  a  time  series  will  refer  to  a  discrete  time-ordered  set  of 
real  numbers  corresponding  to  a  single  component  of  a  dynamical  system.  In  a  broad  sense, 
a  dynamical  system  is  simply  a  map  f  :  X  X  of  a  metric  space  (X,d)  (1:60)  (i.e., 
set  X  with  metric  d)  into  itself.  The  orbit  of  a  point  a:  in  X  is  the  sequence  of  points 
{x,  /(x),  /o/(x),  /o/o  /(x), . . .}  (2:134).  For  time  series  extraction,  attention  is  restricted 
to  the  particular  metric  spaces  3?”,n  >  1,  or  subsets  of  them,  with  the  metric  specified  in 
context. 

Consider  the  metric  space  X  =  [0, 1]  with  the  Euclidean  metric;  that  is,  the  closed  unit 
interval  where  the  distance  between  two  points  is  the  absolute  value  of  their  difference.  The 
identity  map  on  X  is  dynamical  system.  So,  too,  is  the  map  /  which  takes  each  element  x 
of  X  to  x/2.  The  orbit  of  the  point  1  under  /  is  the  sequence  {1, 1/2, 1/4, 1/8, . . .}. 

A  more  interesting  type  of  dynamical  system  can  be  defined  on  a  subset  C  of  [0, 1] 
called  the  (classical)  Cantor  set  (1:180).  The  Cantor  set  can  be  formed  by  iteratively  deleting 
open  middle  intervals  from  [0, 1].  Figure  1  shows  the  first  few  steps  in  its  generation.  Letting 
Bo  =  [0, 1],  Bi  =  [0, 1/3]  U  [2/3, 1],  B2  =  [0, 1/9]  U  [2/9, 1/3]  U  [2/3, 7/9]  U  [8/9, 1], 
. . . ,  C  is  defined  as  However,  it  is  possible  to  view  the  Cantor  set  as  the  unique 

“fixed  point,”  or  “attractor,”  of  an  “iterated  function  system”  (2:82).  To  this  end,  define  two 
“contraction  mappings”  wi  and  W2  from  3?  into  3?  by  iwi(x)  =  x/3  and  u;2(a;)  =  a:/3  +  2/3. 


5 


Bo  f - 1 

0  1 

Bi  f - i - X - * - ] 

0  1/3  2/3  1 

B2  e - 1-  x-it - i - X - * - K  -X  X - ] 

0  1/9  2/9  1/3  2/3  7/9  8/9  1 

Figure  1.  Building  the  Cantor  set 

An  intuition  of  the  reason  for  the  name  “contraction  mappings”  can  be  gained  by  noticing  that 
these  functions  shrink  any  interval  on  which  they  operate  (by  a  factor  of  1/3).  Together  with 
the  space  on  which  they  operate,  wi  and  W2  form  an  iterated  function  system.  The  term 
“iterated  function  system”  derives  from  the  fact  that  these  functions  are  understood  to  operate 
jointly  and  repeatedly  on  3?,  or  a  subset  of  3?.  Consider  the  actions  of  these  functions  on  the 
closed  unit  interval. 

At  the  first  iteration,  wi  takes  [0, 1]  to  [0, 1/3],  and  W2  takes  [0, 1]  to  [2/3,  Ij.  Thus  the 
set  Bi  results  from  one  application  of  wi  and  W2.  The  second  iteration  consists  of  applying 
wi  and  W2  to  Bi.  The  resulting  set  is  B2.  With  very  few  further  iterations,  it  should  be  clear 
to  the  reader  that  the  “infinite  iteration”  of  this  iterated  function  system  is  the  Cantor  set.  In 
fact,  Barnsley  provides  a  theorem  which  assures  that  C  will  result  from  applying  wi  and  W2 
to  any  nonempty  compact  subset  of  32  (2:82).  It  is  in  this  sense  that  C  is  referred  to  as  the 
attractor  of  this  iterated  function  system. 

The  Cantor  set  is  perhaps  the  quintessential  example  of  a  fractal  set.  A  rough  “mind’s 
eye”  imt^e  of  it  is  available  in  Figure  2;  it  xtually  consists  of  uncountably  many  points  but 
contains  no  intervals  (2:134)  (6:39).  Like  many  fractals,  it  is  self-identical  across  scale;  for 
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Figure  2.  Low  Resolution  Cantor  Set  C 

example,  if  the  portion  of  C  lying  in  the  interval  [0, 1/9]  is  expanded  by  a  factor  of  9,  all  of  C 
is  retrieved. 

A  dynamical  system  5  :  C  -♦  C  can  be  defined  by  taking  the  image  ^(a)  of  any  number 
a  in  C  to  be  the  preimage  of  the  only  map  wx  :C  C  ox  W2 ‘.C  C  whose  range  contains 
a.  This  type  of  dynamical  system  is  called  a  shift  dynamical  system  (2:144).  Appendix  A 
contains  an  elaboration  of  iterated  function  systems,  shift  dynamical  systems,  and  related 
concepts. 

The  word  dynamical  in  the  preceding  definition  of  dynamical  system  comes  from  the 
repeated  discrete  actions  of  a  function  on  a  set — the  dynamics  implied  is  the  discrete  evolution 
of  points  under  iterations  of  the  defining  map.  Another  definition  allows  for  the  continuous 
motion  of  points  in  a  set  (18:160).  Accordingly,  a  dynamical  system  is  a  continuously 
differentiable  map  4>  :  dt  x  S  S  where  is  an  open  subset  of  3?",  n  >  1,  and  letting 
<f>{x,t)  =  ^t(x),  the  map  <^t  ;  5  — ^  5  satisfies  (1)  the  map  <f>o  :  S  -*  S  is  \ho  identity;  and 
(2)  the  composition  <f>to<f>,  =  for  each  f ,  s  in  %. 

The  parameter  t  is  usuaUy  associated  with  time;  thus  the  magnitude  of  t2  (relative  to 
some  previous  time  fi)  gives  an  indication  of  how  long  a  point  x  has  wandered  about  in  S', 
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and  <f>t^(x)  gives  the  point  in  S  to  which  it  has  arrived  at  time  Property  (2)  demands  that  a 
terminal  location  be  independent  of  choice  of  initial  time. 

>^th  this  definition,  every  dynamical  system  gives  rise  in  a  natural  way  to  a  differential 
equation  (18:160).  Conversely,  the  fundamental  theorem  of  differential  equations  shows  that 
every  differential  equation  gives  rise  to  a  dynamical  system.  Consider,  for  example,  the 
differential  equation 

x'  =  Ax  (1) 

where 


x'  = 

dxi(t) 

dt 

,  A  = 

2  0 

,  and  X  = 

Xi(<) 

dX2(t) 

L  dt  J 

0  -1/2 

_  X2(t) 

For  any  initial  condition  xo  =  (a:i(0),  X2(0))^,  this  equation  has  the  unique  solution 

<^t(x)  =  (xi(0)e^‘,X2(0)e"‘/^) 

Given  any  point  (xi,X2)  in  therefore,  there  is  a  unique  solution  curve  (^t(x)  passing 
through  that  point.  Such  solution  curves  are  called  trajectories.  A  trajectory  may  be  thought 
of  as  the  path  taken  by  a  particle  placed  at  the  point  (xi,x2)  subject  to  a  set  of  forces 
represented  by  Equation  1 .  The  set  of  all  points  in  the  plane  are  acted  on  by  the  map  <f>t,t  G  3? 
to  determine  a  family  of  trajectories,  which  are  collectively  called  the  flow  on  3?^  determined 
by  Equation  1 .  A  qualitative  representation  of  the  flow  determined  by  Equation  1  is  given  in 
Figure  3. 

In  dynamical  systems  theory,  the  space  3IJ^  of  Figure  3  is  called  phase  space  (25:438). 
The  term  state  space  is  sometimes  used  instead  (18:22).  The  current  state  of  a  particle  —  that 
is,  its  location  (xi,  X2)  —  is  all  the  information  needed  to  characterize  for  all  time  the  motion 
of  the  particle  under  the  influence  of  the  forces  represented  by  Equation  1 . 

This  last  fact  is  a  consequence  of  the  fundamental  theorem  of  (ordinary)  differen¬ 
tial  equations  and  warrants  emphasis.  One  version  of  this  theorem  may  be  stated  as  fol¬ 
lows  (18:162).  Let  be  an  open  subset  of  3?”,  f  :  ►  3?”  a  continuously  differentiable 
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Figure  3.  Some  Solution  Curves  to  x'  =  Ax 


map,  and  xo  €  Then  there  is  some  a  >  0  and  a  unique  solution  x  :  (-a,  a)  of  the 
differential  equation  x'  =  f(x)  satisfying  the  initial  condition  x(0)  =  xq. 

This  theorem  assures  both  the  existence  and  uniqueness  of  solutions  to  a  large  class  of 
differential  equations.  As  an  immediate  consequence  of  uniqueness,  two  solution  curves  of 
x'  =  f(x)  cannot  cross.  Furthermore,  a  solution  curve  cannot  cross  itself  (18:168).  Thus 
solution  curve(s)  cannot  evolve  through  3?"  as  depicted  in  Figure  4.  A  solution  curve  which 
revisits  the  same  point  in  must  close  up  as  shown  in  Figure  5. 

A  dissipative  dynamical  system  is  one  which  contracts  phase  space  volumes.  Suppose, 
for  example,  that  all  points  in  the  phase  space  of  Figure  5  were  on  trajectories  that  converged 
to  the  closed  curve  depicted.  Then  any  small  volume  element  in  the  phase  space  would 
ultimately  be  contracted  onto  the  closed  curve.  Since  the  closed  curve  has  volume  zero,  the 
volume  element  was  contracted  by  the  dynamical  system.  Thus  the  dynamical  system  would 
be  dissipative.  On  the  other  hand,  if  trajectories  far  outside  the  closed  curve  diverged  away 
from  it,  then  some  volume  elements  would  expand  and  the  dynamical  system  would  not  be 
dissipative. 


9 


10 


A  chaotic  dynamical  system  is  a  dissipative  dynamical  system  which  has  one  or  more 
positive  Lyapunov  exponents  (40).  Wolf  et  al  provide  a  good  explanation  of  Lyapunov 
exponents  and  their  relationship  to  time  series  (42).  This  reference  also  provides  source 
code  for  a  program  which  finds  positive  Lyapunov  exponents.  All  numerical  methods  for 
determining  Lyapunov  exponents,  however,  involve  rather  subjective  choices  of  parameters, 
and  there  is  often  disagreement  over  whether  any  particular  time  series  is  or  is  not  chaotic. 

Essentially,  the  existence  of  at  least  one  positive  Lyapunov  exponent  assures  that  a 
chaotic  dynamical  system  stretches  small  volume  elements  in  one  or  more  directions;  but 
since  the  system  is  dissipative,  the  volume  elements  contract  to  volume  zero.  For  example,  a 
small  cube  in  9^  might  be  transformed  into  a  long  curve  in  3?^.  Perhaps  the  most  celebrated 
chaotic  dynamical  system  is  given  by  the  Lorenz  equations  (21:135): 

x'  =  —ax  +  ay 
y'  =  —xz  +  rx  —  y 
z'  =  xy  —  bz 

A  solution  curve  for  this  system  of  equations  (corresponding  to  the  initial  condition  x  =  y  = 
z  =  b,  and  with  (j  =  10,  r  =  28,  and  b  =  8/3)  winds  through  a  bounded  region  of 
without  ever  intersecting  itself  (16:29).  Its  projection  into  the  (x,  j/)-plane  is  approximated  in 
Figure  6.  This  figure  aiso  provides  a  “mind’s  eye”  image  of  the  projection  of  the  “limit  set”  in 
9?^  toward  which  trajectories  from  almost  all  initial  points  converge.  This  limit  set  is  called 
the  strange  attractor  associated  with  the  Lorenz  equations,  or  sometimes  simply  the  Lorenz 
attractor.  The  attractor  of  a  dynamical  system  is  of  considerable  interest  because  it  represents 
the  long  term  behavior  of  the  system. 

2.2  Dimensionality  of  Chaotic  Trajectories 

Often  a  strange  attractor  will  have  associated  with  it  a  fractal  quality  such  as  exhibited 
by  the  Cantor  set.  For  example,  (apparent)  pairs  of  inner  loops  of  the  strange  attractor 
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Figure  6.  A  Lorenz  Trajectory 


approximated  by  Figure  6  appear  to  be  nestings  of  pairs  farther  outside.  It  turns  out  that  a 
single  noninteger  number  can  characterize  the  self-similarity  of  the  Lorenz  attractor.  Other 
attractors  may  fail  to  display  a  fractal  quality  (7:625). 

The  quantity  used  to  characterize  the  self-similarity  of  trajectories  such  as  those  which 
approximate  the  Lorenz  attractor  is  called  fractal  dimension.  Actually  fractal  dimension  is 
something  of  a  catchall  phrase  used  to  describe  a  number  of  such  quantifications,  although 
some  authors  do  rigorously  define  the  term  (usually  in  a  manner  consistent  with  what  is 
descriptively  known  as  the  box  counting  dimension  (2:174)). 

Theiler  provides  a  geometric  intuition  of  fractal  dimension  as  a  generalization  of  the 
more  commonly  understood  notion  of  dimension  of  geometric  objects  (39:1059).  Dimension 
can  be  interpreted  as  an  exponent  that  expresses  the  scaling  of  an  object’s  bulk  with  its  size: 

bulk  ~  size*"*"'""”"  (2) 

The  “bulk”  in  this  formulation  may  correspond  to  a  mass,  a  volume,  or  some  measure  of 
information  content,  and  “size”  is  a  linear  distance.  The  volume  (bulk)  of  an  object  in  3^^ 
such  as  a  sphere,  for  example,  scales  cubically  with  its  diameter  (size),  so  the  object  is  three 
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dimensional.  Accordingly,  the  various  definitions  of  fractal  dimension  are  usually  cast  as 
equations  of  the  form 

log  bulk 

dimension  =  lim  - : — 

size-»0  log  size 

where  the  limit  of  small  size  is  taken  to  ensure  that  an  object’s  fractal  dimension  will  not 
change  if  it  is  linearly  transformed  (for  instance,  by  a  rotation).  This  small-size  limit  also 
makes  dimension  a  local  quantity  so  that  a  global  definition  of  fractal  dimension  requires  some 
kind  of  averaging. 

This  global  averaging  is  achieved  automatically  by  the  definition  of  capacity  or  box 
counting  dimension  (25:330)  (2:176).  Consider  an  attractor  A  in  3?”'.  Cover  3?’"  by  closed 
just-touching  hypercubes  of  side  length  Cr**  where  C  >  0  and  0  <  r  <  1  are  fixed  real 
numbers;  see  Figure  7,  where  m  =  2,  C  =  1,  and  r  =  1/2.  Let  A/’„(A)  denote  the  number  of 


Figure  7.  Finding  Capacity  Dimension;  A/2(^)  =  10  andA/3(A)  =  23 
hypercubes  of  side  length  (7r"  which  intersect  A.  If  the  limit 


exists,  then  A  has  capacity  dimension  D. 
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A  time  series  xi,  0:2,  X3, . . .  consists  of  a  sequence  of  real  numbeis,  hence  there  is  little 
a  priori  geometric  intuition  behind  the  notion  of  fractal  dimension  of  a  time  series.  Suppose, 
however,  that  the  time  series  consists  of  samples  equally  spaced  in  time  and  is  delay  coordinate 
embedded  into  a  space  of  dimension  higher  than  one,  in  the  following  manner.  Fix  some  small 
integer  k  >2.  Form  from  the  time  series  a  set  of  points  x,,  i  =  1, 2, ...  in  by  taking 
successive  fc-tuples  from  the  sequence  xi,  X2,  X3,  —  That  is. 


Xi 

•  •,Xjt) 

X2 

—  (X2,  X3,  . 

•  •  5  X^.).l  ) 

X3 

(*^3)  ^4 :  * 

Then  the  resulting  set  of  points  may  ind  od  have  a  fractal  structure  in  3?*^,  similar  perhaps  to 
the  attractor  A  depicted  in  Figure  7.  The  spaces  3?*^  are  called  embedding  or  reconstruction 
spaces. 

Grassberger  and  Procaccia  provided  in  1983  an  elegant  algorithm  for  finding  a  fractal 
dimension  of  a  time  series  using  just  such  delay  coordinate  embeddings  (17).  They  termed 
the  dimension  which  results  from  their  algorithm  the  correlation  exponent  (denoted  1/)  for  the 
time  series,  but  the  term  correlation  dimension  is  now  commonly  used.  They  demonstrated 
that  correlation  dimension  is  bounded  above  by  information  dimension  (another  commonly 
used  measure  of  fractal  qualities  (38))  which  in  turn  is  bounded  above  by  capacity  dimension. 
For  most  attractors  studied,  these  bounds  are  quite  tight.  As  will  be  seen,  great  accuracy  of 
fractal  dimension  estimation  is  not  necessary  for  implementation  of  the  embedding  theorem, 
which  is  the  foundation  for  the  local  linear  prediction  method.  For  this  reason,  the  term  fractal 
dimension  will  often  be  used  to  mean  the  approximately  equal  value  of  any  of  these  quantities. 

The  Grassberger  and  Procaccia  algorithm  is  easy  to  describe  and  to  'inplement;  Refer¬ 
ence  (36:3-7),  for  example,  provides  a  succinct  explanation.  It  involves  examining,  for  each 
of  a  small  set  of  embedding  dimensions  k,k  +  l,k  +  2,. . .  ,k  +  s,the  embedded  time  series 
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data  to  see  if  it  exhibits  an  exponential  scaling  as  in  Equation  2.  Here  k  is  taken  slightly 
larger  than  the  actual  correlation  dimension  i/',  since  1/  is  unknown,  additional  embedding 
dimensions  are  used  to  ensure  that  </  has  indeed  been  exceeded,  and  that  ^  ^  exceeds  about 

2i/.  The  condition  indicative  of  the  existence  of  a  fractal  structure,  the  linearity  of  log-log 
plots,  is  always  displayed  for  values  of  embedding  dimensions  in  excess  of  about  2i/  (and 
often  for  smaller  values).  In  fact,  the  value  of  u  is  the  common  slopes  of  these  plots. 

Using  the  embedding  dimension  k,  the  Grassberger  and  Procaccia  method  locates  all 
contiguous  k-tuples  from  the  time  series  as  points  in  A  set  of  small  numbers  /,  is  chosen 
(about  eight  is  adequate),  and  for  each  i  the  number  of  pairs  of  contiguous  A;-tuples  within 
Euclidean  distance  /,  of  each  other  is  determined  and  denoted  C{li).  Figure  8  illustrates  a 


Figure  8.  Counting  Close  fc-tuples  (fc  =  3) 

case  where  the  3-tuple  {Pm,  Pm +1,  Pm +2)  is  close  lo  {Po,  Pi,  P2)  in  Euclidean  distance  and 
would  likely  contribute  to  the  count  C'(/,)  for  most  U.  On  the  other  hand,  (P5,  Pe,  Pj)  is 
relatively  far  from  (Po,  Pi,P2)  and  might  not  contribute  to  the  count  C(l,)  for  any  of  the 
selected  /<.  The  Grassberger  algorithm  can  be  implemented  by  taking  the  smallest  /,  and 
the  leftmost  3-tupIe  (Po,  Pi,  P2)  and  checking  each  3-tuple  to  the  right  for  its  adherence  to 
the  /,  distance  criterion  -  first  (Pi,  P2,  P3),  then  (Pj,  P3,  P4),  continuing  until  all  contiguous 
3-tuples  are  compared  with  (Po,  Pi,  P2).  After  the  rightmost  3-tuple  has  been  compared,  the 
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process  repeats  beginning  at  the  left  with  (i^,  Pj,  P3).  The  C'(/,)  counter  continues  to  grow 
as  more  pairs  of  3-tuples  are  found  to  fall  within  the  allowable  /,  distance.  After  the  rightmost 
3-tuple  is  compared  with  (Pi,  Pj,  P3),  (P2,  P3,  P4)  serves  as  the  leftmost  3-tuple  for  the  next 
iteration. 

Continuing  in  like  manner,  leftmost  finally  becomes  rightmost,  and  the  counter  C{li) 
indicates  the  total  number  of  contiguous  3-tuples  which  lie  within  distance  /,  of  each  other. 
The  numbers  /,  and  C(/i)  are  stored,  another  Ij  is  selected,  and  the  process  begins  anew  with 
(Po,  Pi,  P2)  at  the  left.  It  terminates  with  C{lj)  determined.  All  pairs  (/„,  (^(/n))  are  likewise 
determined  and  stored. 

If  the  pairs  (ln(/j),  ln(C(/i)))  are  plotted,  and  are  found  nearly  collinear,  then  the  slope 
of  a  line  through  these  points  provides  a  good  approximation  of  the  fractal  dimension  of  the 
dynamical  system  which  produced  the  time  series.  See  Figure  36  and  Figure  40  for  examples 
of  such  plots;  10  =  1  in  these  plots. 

A  local  linear  prediction  methodology  will  be  described  in  the  next  section.  This 
methodology  is  based  on  the  existence  of  a  unique  finite  fractal  dimension  of  a  solution 
trajectory,  rather  than  on  the  chaos  of  the  system  which  produced  it.  It  might  make  more 
sense,  therefore,  to  speak  of  fractal  time  series  prediction  rather  than  chaotic  time  series 
prediction.  Following  Farmer  (8),  however,  the  latter  terminology  is  preserved. 

2.3  Embedology  and  Prediction 

The  process  of  successfully  extracting  a  correlation  dimension  v  reveals  that  beyond  a 
certain  integer  dimension  m  for  the  delay  coordinate  embedding  space,  the  calculated  value 
of  u  remains  nearly  constant.  Putting  the  numerical  process  of  dimensionality  determination 
aside,  there  is  (subject  to  certain  mild  technical  conditions  (33:13))  a  theoretical  value  M  of 
embedding  space  dimension  beyond  which  1/  cannot  change.  This  is  because  for  values  m  > 
M ,  the  delay  coordinate  embedding  of  the  state  space  trajectory  into  reconstruction  space  3?'” 
is  (in  a  probability-one  sense)  a  smooth  diffeomorphism,  and  correlation  dimension  is  invariant 
under  such  mappings  (32:178)  (27:368).  (A  diffeomorphism  is  a  differentiable  mapping  from 
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one  open  subset  of  a  vector  space  to  another  with  a  differentiable  inverse  (18:242);  to  say  that 
it  is  smooth  means  that  its  derivative  is  continuous  (33:6).) 

More  to  the  point,  Sauer  showed  that  as  long  as  M  >  2d,  where  d  is  the  box  counting 
dimension  of  the  trajectory  in  state  space  the  delay  coordinate  mapping  from  the  trajectory 
into  is  one-to-one  (32:178).  This  is  the  essence  of  what  he  called  the  fractal  delay  coordi¬ 
nate  embedding  theorem,  which  is  a  generalization  of  Takens’  theorem.  Although  correlation 
dimension  is  always  less  than  or  equal  to  the  box  counting  (aka  fractal  (2:176))  dimension, 
the  difference  between  them  is  rarely  lai^er  than  0.05  (17:193)  and  the  embedding  theorem 
will  often  be  invoked  using  box  counting,  fractal  and  correlation  dimensions  interchangeably. 

The  one-to-one  property  has  important  implications  for  prediction.  The  fundamen¬ 
tal  theorem  of  differential  equations  assures  that  a  phase  space  solution  trajectory  is  com¬ 
pletely  determined  by  any  point  on  it  (18:162).  A  delay  coordinate  embedding  into  re¬ 
construction  space  3?"*,  m  >  M,  shares  this  property.  In  particular,  given  any  point 

y{t  —  l)r))  on  the  reconstruction  space  trajectory,  its  image  on  the 

phase  space  trajectory  has  a  unique  succeeding  point.  The  y-component  of  that  succeeding 
point  is  y(t  +  r);  see  Figure  9.  If  the  point  {y{t),  y{t  -  t),.  . .  ,y{t  -  {m  -  1)t))  reappears 
on  the  reconstmction  space  trajectory  at  a  time  t*  >  t,  so  that  (y(<*),  y{t*  -  t),  . . . ,  y{t*  - 
(m  —  l)r)  =  (j/(t),  y{t  —  t),  . . . ,  y(t  —  (m  —  l)^)),  then  necessarily  the  next  point  on  the 
reconstruction  space  trajectory  is  the  same  for  both  times  (and  the  trajectories  in  both  spaces 
are  closed  curves,  as  in  Figure  5).  That  is,  y{t*  +  t)  =  y{t  -H  t).  If  t*  happens  to  be  the 
current  time,  then  a  match  of  any  previous  m  consecutive  time  series  values  (with  the  m-tuple 
ending  with  y(t*))  provides  a  means  of  predicting  y{t*  -|-  r):  simply  read  off  the  known  next 
value  of  the  time  series,  y{t  -|-  t). 

2.4  Conclusion 

This  chapter  has  introduced  the  concepts  of  chaotic  dynamical  systems  and  fractal 
properties.  It  is  stressed  that  for  all  but  a  few  simple  examples,  numerical  techniques  must 
be  relied  upon  to  reveal  the  presence  or  absence  of  chaotic  dynamics,  and  the  results  of  these 
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Figure  9.  An  Embedding  in  Reconstruction  Space  3?*" 

techniques  are  often  open  to  subjective  interpretation.  The  terminology  chaotic  time  series  is 
herein  understood  to  designate  a  time  series  which  exhibits  a  unique  finite  fractal  dimension. 
In  this  work,  this  will  generally  mean  that  Grassberger  and  Procaccia  analyses  reveal  nearly 
linear  log-log  plots. 

The  following  chapter  will  build  on  the  important  embedding  theorem  to  develop  two 
modifications  of  the  local  linear  prediction  method  introduced  by  Farmer. 
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III.  Further  Localizing  Local  Linear  Prediction 

3.i  Introduction 

This  chapter  develops  some  modifications  of  the  local  linear  prediction  method  intro¬ 
duced  by  Farmer  in  1987  (8).  Section  3.2  reviews  the  basic  method.  Section  3.3  provides 
insight  into  the  method  by  demonstrating  some  conditions  under  which  linear  prediction 
is  exact.  Section  3.4  reviews  some  important  enhancements  to  the  local  linear  prediction 
method  which  were  introduced  by  Casds^li  (4).  He  showed  how  to  determine  appropriate 
data-dependent  parameters  for  use  with  the  method.  These  parameters  consist  of  the  best  pair 
(m,  k)  of  embedding  dimension  m  and  number  k  of  nearest  neighbors  to  use  on  average  for 
prediction  from  anywhere  within  a  reserved  portion  (called  the  “testing  set”)  of  the  available 
time  series  data.  It  may  seem  that  prediction  accuracy  could  be  enhanced  by  using  the  value 
of  k  found  optimal  for  a  single  point  in  the  embedding  space  which  is  nearest  the  desired 
prediction  point.  It  is  shown  in  Section  3.5,  however,  that  such  a  localization  of  parameter 
estimation  often  resulted  in  reduced  prediction  accuracy.  The  next  two  sections  describe  the 
two  pruning  algorithms  developed  during  this  research,  and  provide  support  for  the  contention 
that  they  often  enhance  prediction  accuracy  over  pure  local  linear  prediction  for  time  series 
which  are  not  extremely  stochastic.  Section  3.6  explains  a  method  for  predicting  from  a  given 
point  by  praning  from  a  set  of  its  neighbors  (of  globally  determined  optimal  size,  or  slightly 
larger)  those  neighbors  which  are  least  typical.  Section  3.7  then  gives  a  method  of  prediction 
from  a  given  point  based  on  two  optimal  time  delays  associated  with  the  given  data.  Sets  of 
nearest  neighbors  are  found  corresponding  to  each  of  the  delays,  and  one  of  the  sets  is  pruned 
based  on  the  overlap  of  its  constituent  time  intervals  with  those  of  the  other  set.  Three  or 
more  delays  could  be  used  with  a  suitable  overlap  criterion.  Some  time  series  not  described 
previously  in  the  chapter,  including  some  financial  time  series,  are  considered  in  Section  3.8. 
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3. 2  Local  Unear  Prediction 


As  noted  earlier,  any  time  series  obtained  from  a  dynamical  system  having  a  fractal 
dimension  may  be  delay  coordinate  embedded  into  a  Euclidean  space  of  sufficiently  high 
dimension  that  the  embedding  is  dynamically  equivalent  to  the  original  system.  Suppose  the 
time  series  x(f),  t  =  1,2,...  corresponds  to  any  component  of  such  a  system,  and  that  the 
system  has  fractal  dimension  d.  The  embedding  theorem  implies  that  if  m  is  an  integer  greater 
than  2d,  then  each  set  of  points  {x(f  —  l),x(#  —  2), . . . ,  x(f  —  m)}  uniquely  determines  a 
next  point  x(f). 

The  embedding  theorem  gives  a  sufficient  condition  for  dynamic  equivalence,  not  a 
necessary  one.  That  is,  there  may  be  a  space  of  integer  dimension  less  than  2d  which  also 
accepts  an  embedding  of  the  original  dynamics.  This  is,  in  fact,  the  case  for  a  time  series 
extracted  from  the  Lorenz  equations,  since  the  Lorenz  attractor  has  a  fractal  dimension  of 
about  2.05.  Consider  a  time  series  obtained  by  sampling  the  function  depicted  in  Figure  10. 
Suppose  it  is  possible  to  embed  the  dynamical  system  from  which  the  time  series  was  obtained 

x(t) 


Figure  10.  Selected  (Value,  Next  Value)  Pairs  of  a  Time  Series 
in  one  dimension,  so  that  its  reconstruction  space  phase  portrait  is  a  function  of  one  variable 
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as  depicted  in  Figure  11.  In  Figure  11,  every  present  value  y  has  a  unique  next  value  y*. 


X* 


Figure  11.  Selected  Next  Values  Over  a  One-Dimensional  Embedding  Space 

although  only  three  such  (value,  next  value)  pairs  are  shown. 

Suppose  the  last  known  value  of  the  time  series  is  px,  and  it  is  desired  to  predict  its 
value  p*  at  the  end  of  the  next  sample  period.  One  method  of  prediction  is  to  perform  a  linear 
regression  using  the  nearest  neighbors  to  the  point  px,  i.e.,  using  the  time  series  values  closest 
to  px  (regardless  of  their  time  indices)  -  in  this  example,  the  points  a*  and  hx.  That  is,  a  line  is 
formed  joining  the  pairs  of  points  (cj;,  a*)  and  [hx,  h*)  in  3?^.  The  value  p*  assumed  by  this 
regression  line  at  px  is  the  predicted  value. 

Linear  regression  generalizes  easily  to  Euclidean  spaces  of  arbitrary  positive  integer 
dimension  (4:307).  It  typically  involves  using  more  than  the  minimum  number  of  nearest 
neighbors  necessary  to  establish  a  hyperplane;  in  the  case  of  Figure  11,  more  than  the  two 
points  nearest  p*.  In  these  cases,  a  hyperplane  is  fit  to  the  available  (neighbor,  next  value)  pairs 
in  such  a  way  as  to  minimize  the  resulting  squared  error.  In  essence,  a  least-squares-optimal 
hyperplane  is  used  to  approximate  the  (neighbor,  next  value)  surface  in  the  neighborhood  of 
the  last  known  portion  of  the  time  series.  See  Figure  12,  for  example,  in  which  the  next  value 
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*n+2 


•••  •  *  i  •  *  i+1  ’  *  i+2 . *  k  ’  *  k+1 

Figure  12.  Local  Linear  Prediction  with  a  Two-Dimensional  Embedding  Space 

XkJi.7  of  a  time  series  is  predicted  using  a  plane  fit  through  four  (neighbor,  next  value)  points 
corresponding  to  the  four  neighbors  closest  to  the  point  (ifc,  ijt+i). 


3.3  Some  Sufficient  Conditions  for  Perfect  Linear  Predictability 

Consider  again  the  prediction  of  the  first  unknown  value  p*  of  the  time  series  depicted  in 
Figure  10.  Under  what  circumstances  will  linear  regression  prediction  yield  the  exact  value? 

A  perfea  linear  regression  prediction  is  shown  in  Figure  1 1 .  That  is,  if  the  value  of 
p*  is  as  indicated,  then  linear  regression  gives  its  value  exactly.  The  only  requirement  for 
the  points  (ft*,  6*),  (p,;,  p*),  and  {an,  a*)  to  be  collinear  is  that  the  slopes  of  the  lines  joining 
{bx,  bl)  to  (pr ,  p* )  and  (p,,  pi)  to  [a^,  <)  be  equal.  That  is. 


6*  ^  • 

x—Px  _  Px  ~  Qx 

bx  Px  Px  Ux 


(3) 


The  time  series  from  the  sampled  function  x{t)  is  assumed  chaotic.  This  means  that  the 
function  x{t)  represents  a  component  of  a  chaotic  dynamical  system.  For  simplicity,  assume 
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this  system  has  an  attractor  which  can  be  represented  in  two  dimensions.  Assumptions  about 
the  attractor  which  make  Equation  3  valid  will  now  be  considered. 

Suppose  the  attractor  is  a  dense  cluster  of  loops  that  are  nearly  concentric  circles.  This  is 
simplified  further  in  Figure  13  by  showing  three  collinear  points  a,  p  and  b  on  three  concentric 


Figure  13.  An  Attractor  as  Concentric  Racetracks 

circular  loops  (shown  disconnected)  of  the  attractor.  Suppose  that  their  respective  succeeding 
points  a*,p*  and  b*  evolved  counterclockwise  and  in  such  a  way  as  to  remain  in  perfect 
alignment.  If  the  lines  ab  and  a*6*  intersect  at  the  center  (x,  y)  =  (0, 0)  of  the  imaginary 
attractor,  then  the  ratios  of  their  x-component  increments  remain  unchanged.  That  is, 

k.-  (4) 

Px  -  Or  Pi-  al 

since  each  ratio  is  equal  to 

Rb  —  Rp 

Rp  —  Ra 


23 


by  similar  triangles.  Since  Equation  4  is  simply  a  cross  multiplication  of  Equation  3,  this 
assumed  configuration  of  points  results  in  perfect  linear  regression  predictability.  This  con¬ 
figuration  is  analogous  to  being  near  a  “center”  of  an  attractor.  Relatively  slight  perturbations 
of  this  assumed  geometry  can  seriously  degrade  the  equality  of  the  ratios  of  ar-component 
increments.  For  example,  consider  the  effect  of  a  shift  of  the  intersection  of  the  lines  ab  and 
0*6*  to  the  point  —Ral^  on  the  y-axis. 

On  the  other  hand,  if  the  points  under  consideration  are  relatively  far  from  any  “center” 
of  an  attractor,  an  adequate  model  of  the  local  dynamics  might  be  as  shown  in  Figure  14. 
The  line  segments  aa*,  pp*  and  bb*  are  assumed  parallel,  as  are  the  lines  through  the  sets  of 

y 


Figure  14.  Parallel  Segments  of  an  Attractor 

collinear  points  a,p  and  6,  and  a*,p*  and  6*.  In  this  case,  too,  the  ratios  of  x-component 
increments  remain  unchanged,  so  linear  regression  predictability  is  again  perfect. 

In  general,  linear  regression  involves  using  more  than  the  minimum  number  of  points 
necessary  to  establish  a  hyperplane.  An  n-dimensional  hyperplane  requires  at  least  n  -t- 1 
points  for  its  linear  regression  determination  (29:34),  but  often  more  than  n  -|-  1  points  are 
available.  In  the  present  case,  linear  regression  often  involves  using  more  than  two  points 
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to  establish  a  linear  regression  line.  If  an  additional  neighbor  c  of  the  point  p  is  used,  and 
if  the  point  (cx,  c*)  is  collinear  with  the  points  (a^,  a*)  and  (6x,  6*)  of  Figure  11,  then  the 
rule  for  perfect  linear  regression  predictability.  Equation  3,  is  unchanged.  That  is,  all  6’s, 
say,  in  Equation  3  can  simply  be  replaced  with  c’s.  However,  if  (c^,  c*)  is  not  collinear  with 
the  points  (a^,  a*)  and  K)’  perfect  linear  predictability  equates  to  the  perfect  fit  of 
the  point  {px,pl)  to  the  regression  line  determined  by  the  three  points  (ox,  a*),  (6x,  ^x) 
(cx,Cx).  Assumptions  of  perfect  linear  predictability  always  reduce  to  assumptions  on  the 
unknown  value  p*. 

This  development  generalizes  to  spaces  of  integer  dimension  greater  than  two.  The 
intuition  gained  is  that  if  past  local  trajectory  segments  closely  resemble  the  present  segment, 
then  local  linear  prediction  is  likely  to  be  quite  accurate. 

3. 4  Identification  of  Optimal  Parameters 

Given  a  time  series  for  which  local  linear  prediction  is  to  be  implemented,  a  number 
of  parameters  must  first  be  determined.  It  is  assumed  that  the  time  series  is  fractal,  that  is, 
that  a  correlation  dimension  d  has  been  identified  for  it.  The  embedding  theorem  guarantees 
that  the  time  series  may  be  embedded  in  a  space  of  any  integer  dimension  m  greater  than  2d, 
There  may  be,  however,  a  space  of  dimension  less  than  or  equal  to  2d  in  which  the  time  series 
can  also  be  embedded.  In  any  case,  for  purposes  of  linear  prediction,  some  advantage  may  be 
realized  by  choosing  one  sufficiently  large  embedding  dimension  over  another. 

Furthermore,  whatever  embedding  dimension  m  is  chosen,  some  number  k  of  neighbors 
nearest  the  prediction  point  in  the  embedding  space  (i.e.,  the  final  m-tuple  of  the  time  series) 
must  be  chosen.  In  general,  a  best  value  of  k  will  depend  on  the  chosen  value  of  m. 

Assuming  a  limitless  supply  of  time  series  data,  what  is  the  best  number  N  of  time 
series  values  to  use?  Furthermore,  should  every  time  series  value  be  used,  or  should  gaps  be 
allowed  between  them?  For  example,  should  only  every  time  series  value  be  used,  where 
the  delay  time  r  might  exceed  one?  What  is  the  desired  prediction  horizon  -  one  time  unit 
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into  the  future,  or  T  time  units?  These  are  some  of  the  decisions  which  need  to  be  made;  in 
practice,  they  are  often  made  by  trial  and  error. 

Martin  Casdagli  presented  a  technique  for  finding  both  the  best  m  and  k  for  a  given 
time  series  (4).  It  is  essentially  an  exhaustive  search  of  all  possible  m’s  and  k's  for  the  pair 
of  values  (m,  k)  which  yields  the  lowest  prediction  error  over  a  restricted  portion  of  the  time 
series. 

Although  Casdagli  does  not  elaborate  the  point,  some  attempt  should  be  made  to 
“detrend”  the  given  data,  ideally  to  create  a  time  series  having  locally  identical  statistics.  For 
example,  many  financial  time  series  exhibit  strong  upward  trends,  and  the  final  embedding 
point  of  the  raw  data  from  such  a  time  series  may  have  too  few  neighbors  to  perform  a 
meaningful  regression.  Perhaps  the  simplest  way  to  detrend  the  data  is  to  replace  the  given 
data  values  Xi  with  their  differences  from  the  succeeding  values;  that  is,  the  given  time  series 
xi,X2,  X3, . . .  is  replaced  with  the  time  series  yi  =  X2  —  xi,y2  =  —  X2, . . .  (24:257).  The 

resulting  time  series  is  then  normalized  to  zero  mean  and  unit  variance  (5:351).  That  is,  each 
time  series  value  y,  is  then  replaced  with  {yi  —  fi)/<T  where  ji  is  the  average  value  of  all  the  y, 
and  a  is  their  standard  deviation.  This  was  the  approach  taken  throughout  most  of  this  research. 
Hereafter,  the  resulting  normalized  detrended  time  series  is  denoted  xi,  xj,  X3, . . . ,  x;v.  The 
following  steps  were  then  applied,  preserving  Casdagli’s  notation.  Figure  15  illustrates  most 
of  the  quantities  involved. 

(a)  IXvo  distinct  portions  of  the  time  series  are  identified,  one  consisting  of  the  first  Nj 
time  series  values  xi,  X2, . . . ,  x^v,  (called  the  fitting  set)  and  the  other  of  the  final  Nt  values 

XA^^+2,  •  •  • ,  (called  the  testing  set).  Thus  Nj-\-Nt  =  N. 

(b)  Some  small  value  of  m  is  selected  (typically  three  or  four;  it  will  be  incremented), 
some  delay  time  t,  and  some  prediction  horizon  T.  In  general,  t  and  T  need  not  be  equal, 
but  they  were  always  taken  equal  in  this  work. 

(c)  Now  the  m-tuple  Xj  ending  with  x„  i  >  Nj,  is  chosen  for  a  T -step  ahead  forecasting 
test.  The  index  i  will  be  incremented  by  one  across  as  much  of  the  testing  set  as  possible;  that 


26 


embedding  space  ( selected  points  in  9t^ ) ; 


Figure  15.  Illustration  of  Embedding  Parameters 
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is,  until  i  +  T  >  N.  The  m-tuples  x,  arc  called  delay  vectors,  or  sometimes  fitting  vectors  (if 
z,  is  in  the  fitting  set)  or  test  vectors  (if  z,  is  in  the  testing  set). 

(d)  The  distances  dij  from  the  test  vector  x,  to  the  fitting  vectors  Xj  arc  computed, 
using  the  maximum  norm  (i.e.,  maximum  of  the  absolute  differences  of  paired  component 
values)  for  computational  efficiency.  Here  j  ranges  over  the  set  of  integers  in  the  interval 
[1  +  (m  -  l)T,Nf  -  T]. 

(e)  The  distances  dij  arc  ordered  so  that  the  relative  proximities  of  the  fitting  vector 
neighbors  to  x,  can  be  determined.  The  neighbor  nearest  x,  is  labeled  Xj(i);  the  next  nearest 
neighbor  is  labeled  Xj(2);  etc.  A  least  squares  hyperplane  in  3?”*+^  is  then  fitted  to  the  m  +  1- 
tuplcs  formed  from  the  k  nearest  neighbors  in  Sft"*  and  their  respective  “forecasted”  values. 
That  is,  the  parameters  a,  are  determined  by  least  squares  from  the  following  affine  model; 

m 

Xj(l)+T  ^  Otfl  4'  y  ]  Q;nZj(/)_(n— 1)t>  ^  —  1,  .  .  .  ,  A:  (5) 

n=l 

The  number  k  of  nearest  neighbors  is  incremented  from  2(m  + 1)  to  N j  ~  T  —  {m  -  1)t,  and 
a  prediction  error  is  calculated  for  each  k  (see  step  (f)).  Exponential  spacing  of  Ar’s  <s  usually 
adequate;  for  example,  the  values  for  k  may  be  taken  as  the  set 

{2(m  +  1), 2(m  +  1)  +  2®,  2(m  +  1)  +  2*,2(m  +  1)  +  2^ . . . ,  <  yV/  -  r  -  (m  -  l)r} 

Each  of  the  k  approximations  in  Equation  5  represents  an  error  which  depends  on  the  param¬ 
eters  ao,ai,...,am.  Squaring  these  errors  (i.e.,  the  differences  of  the  left-  and  right-hand 
sides  of  these  approximations),  and  summing  them,  gives  an  error  function  dependent  on 
these  m  -|- 1  parameters.  Setting  the  partial  derivatives  of  this  error  function  to  zero  gives  a 
system  of  m  -f  1  equations  (the  ‘normal”  equations)  in  m  -f  1  unknowns.  The  solution  of  this 
system  (assuming  it  has  a  unique  solution)  is  the  set  of  values  a,  which  give  the  desired  least 
squares  hyperplane.  The  normal  equations  may  be  updated  recursively  as  ifc  is  increased.  Let 
m  =  m  -  1.  In  vector  notation,  the  system  of  equations  actually  being  solved  is  A  a  =  b 
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where  A  is  the  matrix 
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and  where  the  vectors  <t  and  b  are  given  by 


a  = 
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Notice  that  all  of  the  sununations  can  be  updated  as  A;  is  increased  simply  by  adding  additional 
terms.  The  actual  solution  of  the  system  Aa  =  6  is  facilitated  using  LU  decomposition;  see 
e.g,  (29:44-48)  and  Appendix  C. 

(f)  Using  the  hyperplane  determined  by  Equation  5,  a  T-step  ahead  forecast  i,+r(fc) 
is  made  for  the  test  vector  x,  and  for  all  applicable  k;  that  is,  the  estimated  value  Xi+T{k)  of 
Xi+T  is  calculated  as 

m 

Xi-^T^k')  ~  "I"  y  '  (n— 1)t 

n=l 

For  Nf  <  i  <  N  —  T,  the  error  e,(A:)  =  |xj+7’(A:)  -  x,+r|  is  computed. 
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(g)  Steps  (c)-(f)  are  repeated  for  all  applicable  i  and  the  normalized  root-mean-square 
(RMS)  forecasting  error  Em{k)  is  calculated  as 

=  {[^E  '?(‘)1/(M  -  2’+ 

i=Nf 

where  a  is  the  standard  deviation  of  the  time  series  (a  =  1  for  time  series  which  have  been 
normalized  to  unit  variance).  If  the  time  series’  mean  fi  and  variance  are  shared  by  the 
testing  set  data,  then  the  trivial  predictions  Xi+rik)  =  n  give  a  normalized  RMS  forecasting 
error  of  one.  On  the  other  hand,  if  the  predictions  Xi+T{k)  are  perfect,  then  Emik)  =  0. 

(h)  Finally,  the  embedding  dimension  m  is  varied  and  plots  are  made  of  Emik)  versus 

k. 

The  algorithm  described  above  (with  minor  modifications)  has  recently  been  named 
the  DVS  (for  deterministic  versus  stochastic)  algorithm  (5).  The  name  derives  from  the  fact 
that  the  shapes  of  the  resulting  plots  can  provide  evidence  of  low  dimensional  deterministic 
chaos,  or  of  high  dimensional  or  stochastic  dynamics.  Low  dimensional  chaos  is  typically 
characterized  by  U-shaped  or  monotonically  increasing  plots  whose  minimum  Em{k)  values 
are  small  and  occur  at  low  values  of  k.  High  dimensional  or  stochastic  behavior  is  often 
indicated  by  relatively  large  minimum  E^ik)  values  occurring  at  high  k  values.  Compare  for 
example  Figure  16,  which  represents  data  derived  from  the  low  dimensional  Henon  attractor, 
and  Figure  17,  which  may  be  surmised  to  represent  data  of  higher  dimension  or  more 
stochastic  origin.  In  fact.  Figure  17  was  obtained  from  a  time  series  derived  from  views  taken 
of  a  military  vehicle  by  an  observer  following  a  raster-style  traversal  of  a  viewing  sphere  (see 
Chapter  IV). 

The  absolute  minimum  value  attained  by  the  curves  provide  the  best  m  and  k 
values  to  use  on  averr^e  for  local  linear  prediction  within  the  testing  set;  see  for  example 
Figure  17,  where  the  best  m  is  10  and  the  best  k  is  86. 

The  normal  equations  of  the  DVS  algorithm  do  not  necessarily  admit  a  unique  solution; 
basically,  three  (or  more)  points  in  3^2^  do  not  determine  a  unique  plane  if  the  points  happen 
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Figure  16.  DVS  Plots  of  Low  Dimensional  Data  (from  Henon  map) 

to  be  coUinear  (the  idea  generalizes  to  higher  dimensional  spaces).  Furthermore,  it  is  possible 
that  some  of  the  points  in  the  various  embedding  spaces  S’"  may  be  revisited  from  different 
portions  of  the  time  series;  in  the  worst  case,  for  example,  the  k  nearest  neighbors  of  a 
point  in  S’"  might  all  be  identical.  The  latter  condition  was  not  often  observed  in  this  work; 
the  embedded  data  was  examined  for  uniqueness  before  applying  the  DVS  algorithm,  and 
in  the  few  cases  where  overlap  existed  for  small  values  of  m,  taking  slightly  larger  values 
of  m  eliminated  the  problem.  The  former  condition  is  more  difficult  to  screen,  but  normal 
equations  which  were  not  uniquely  solvable  (ie,  the  condition  of  having  a  noninvertible  matrix 
A)  seldom  arose  in  this  research.  It  is  felt  that  taking  k  greater  than  or  equal  to  2(m  + 1)  in  step 
(e),  rather  than  merely  greater  than  or  equal  to  the  absolute  minimum  m  +  1,  served  to  help 
assure  unique  solvability,  by  decreasing  the  likelihood  of  inadvertently  creating  noninvertible 
matrices  A. 

3. 5  Using  Locally  Best  Number  of  Nearest  Neighbors 

Suppose  that  one  has  used  the  DVS  algorithm  with  a  particular  time  series  and  found 
the  optimal  m  and  k  values  for  a  particular  choice  of  fitting  and  testing  set  sizes.  Suppose 
furthermore  that  a  prediction  is  desired  of  the  time  series  value  at  time  Nj  +  Nt  +  T.  One 
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Figure  17.  DVS  Plots  of  Higher  Dimensional  Data  (from  raster  traversal) 

clear  choice  of  prediction  methodology  is  to  apply  local  linear  prediction  at  the  time  Nj  +  Nt, 
using  the  optimal  m  and  k\  that  is,  simply  extend  the  testing  set  m-tuples  to  the  the  most 
recent  one  and  allow  the  algorithm  to  compute  one  more  prediction.  Since  this  requires  very 
little  modification  of  the  DVS  algorithm,  it  will  be  referred  to  as  DVS  prediction. 

Recall,  however,  that  the  optimal  (m,  k)  pair  was  found  globally  over  the  entire  testing 
set.  Suppose  that  instead  of  using  the  number  k  of  nearest  neighbors  found  optimal  for 
the  entire  testing  set,  one  used  instead  the  number  of  nearest  neighbors  found  optimal  for 
predicting  from  the  one  m-tuple  in  the  testing  set  which  is  closest  to  the  m-tuple  ending  at 
Nf  +  Nt.  In  terms  of  Figure  15,  the  prediction  from  the  m-tuple  x„  where  i  =  Nj  +  Nt, 
is  based  on  the  number  of  neighbors  found  best  for  predicting  from  the  m-tuple  Xj(i)  where 
now  Xj^l)  is  understood  to  be  in  the  testing  set  rather  than  the  fitting  set.  The  intuition  is  that 
different  regions  of  the  attractor  may  be  populated  more  densely  with  embedded  points  than  are 
other  regions,  and  more  populated  regions  may  predict  more  accurately  with  fewer  numbers  of 
nearest  neighbors  determining  the  linear  regression  hyperplane.  That  is,  the  effect  of  including 
neighbors  which  are  too  far  away  may  be  to  skew  the  prediction  hyperplane  from  its  optimal 
orientation.  Using  only  the  number  of  nearest  neighbors  found  locally  optimal  may  reduce 
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the  skewing  as  compared  to  using  the  number  of  neighbors  found  globally  optimal.  Similarly, 
sparsely  populated  regions  may  predict  more  accurately  using  more  nearest  neighbors. 

Mixed  results  were  obtained  when  this  technique  was  used.  It  was  applied  to  the  time 
series  “A.dat”  used  in  the  1991  Santa  Fe  Institute’s  time  series  prediction  competition.  This 
data  was  obtained  from  an  infrared  laser  experiment.  It  is  arguably  low-dimensional  chaotic 
and  noise-cormpted  (41:32,325).  Figure  18  shows  the  1000  data  values  made  available  to  the 

A.dat  value 


Figure  18.  Experimentally  Obtained  Laser  Data 

contestants.  This  data  was  used  in  this  research  without  differencing  or  normalization.  The 
DVS  algorithm  was  applied  to  this  data  using  Nj  =  800  and  ten  successive  values  of  Nt 
beginning  with  Nt  =  180.  In  it,  the  parameters  used  were  m  =  4and7'  =  T  =  l  (although 
a  smoothing  advantage  may  have  been  realized  by  taking  T  =  t  >  1);  the  globally  best 
number  k  of  nearest  neighbors  was  recomputed  with  each  new  value  of  Nt.  The  predictions 
are  summarized  in  Table  1.  Notice  that  although  the  algorithm  which  used  the  locally  best 
number  of  nearest  neighbors  had  a  far  higher  total  error,  it  actually  came  closer  to  the  correct 
result  than  the  DVS  algorithm  half  the  time. 

A  similar  comparison  was  made  of  the  two  algorithms  using  differenced  and  normalized 
data  from  the  Lorenz  equations.  Once  again,  the  DVS  algorithm  had  a  lower  total  error,  but 
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Table  1.  Laser  Data  Predictions 


time  series  index 

true  val 

DVS  preds 

locally  best  k  preds 

981 

57.0000 

57.0509 

58.5805 

982 

21.0000 

20.0848 

20.3972 

983 

13.0000 

13.3638 

13.5735 

984 

14.0000 

13.5747 

13.6128 

985 

27.0000 

26.9606 

37.6951 

986 

87.0000 

83.8718 

83.7341 

987 

179.0000 

182.4079 

178.1631 

988 

103.0000 

104.2120 

102.7556 

989 

33.0000 

31.3810 

31.6529 

990 

15.0000 

15.1364 

15.1646 

RMS  error  using  DVS  preds  =  1 .6329 

RMS  error  using  locally  best  k  preds  =  3.61 93 

this  time  only  slightly  lower.  The  DVS  algorithm  was  more  clearly  superior  in  a  test  of  data 
generated  from  the  Henon  map. 

Prediction  using  the  optimal  k  for  the  one  embedded  testing  set  vector  nearest  the 
prediction  vector  resulted  in  increased  RMS  prediction  error  compared  to  the  DVS  algorithm. 
This  approach  was  therefore  abandoned.  It  may  be  possible  to  improve  upon  the  approach 
by  using  an  averaged  value  of  the  best  k's  for  several  nearest  embedded  testing  set  vectors, 
rather  than  just  one.  This  modification  may  reduce  the  likelihood  of  outlier  predictions  such 
as  occurred  at  time  series  index  985  in  Table  1. 


3.6  Pruning  by  Variance  from  Regression  Hyperplane 

The  DVS  algorithm  often  supplies  a  larger  number  of  nearest  neighbors  than  necessary 
for  a  unique  solution  of  the  normal  equations.  It  was  found  that  a  sort  of  local  filtering  can  often 
yield  increased  prediction  accuracy  by  eliminating  from  the  utilized  set  of  nearest  neighbors 
some  which  might  be  considered  noise  corrupted. 

Recall  that  local  linear  prediction  fits  a  least  squares  hyperplane  to  the  (neighbor,  next 
value)  surface  in  the  vicinity  of  the  last  known  m-tuple  of  the  time  series.  Any  (neighbor, 
next  value)  pairs  which  depart  radically  from  the  presumed  reasonably  smooth  (neighbor,  next 


34 


value)  surface  may  skew  the  computed  least  squares  hyperplane  and  detract  from  the  accuracy 
of  the  prediction.  Consider  Figure  19,  in  which  it  is  presumed  that  the  DVS  algorithm  yielded 


Figure  19.  Pruning  Nearest  Neighbors 

an  optimum  embedding  dimension  m  =  1  and  number  of  nearest  neighbors  k  =  5.  The 
presumed  noiseless  (neighbor,  next  value)  surface  is  shown  as  a  smooth  curve.  The  linear 
regression  hyperplane  determined  by  the  live  neighbors  nearest  x,  is  actually  a  line  when 
m  =  1  and  is  here  denoted  R.  Notice  that  the  “noisy”  neighbors  Xj(i)  and  Xj(5)  pull  R  away 
from  the  true  next  time  series  value  Xj.).! .  If  these  two  points  are  deleted  from  the  set  of  nearest 
neighbors  and  a  new  linear  regression  line  Rnew  is  determined  using  only  the  remaining  three 
neighbors,  then  the  error  in  the  predicted  value  Xj+i  (found  now  on  i?new  instead  of  on  R)  is 
reduced.  Some  criterion  must  be  adopted  to  determine  how  far  away  from  R  the  (neighbor, 
next  value)  points  must  be  to  warrant  exclusion.  In  this  work,  the  average  fi  and  standard 
deviation  a  of  the  distances  of  all  next  values  from  R  was  determined,  and  points  falling 
outside  ±{n  +  7<t)  bounds  of  R  were  excluded.  Here  the  exclusion  parameter  7  is  a  real 
number  of  small  magnitude,  typically  between  zero  and  four.  The  exclusion  zone  in  Figure  19 
is  the  region  outside  the  band  between  R+  and  R-. 

This  idea  generalizes  to  embedding  dimensions  greater  than  one.  As  a  prediction 
algorithm,  it  will  be  referred  to  as  pruned  outliers  prediction.  As  will  be  shown,  it  was 


35 


generally  found  capable  of  improving  prediction  accuracy  compared  to  the  DVS  algorithm. 
Except  for  extremely  stochastic  time  series,  prediction  improvement  occurred  for  most  values 
of  7  chosen  in  the  range  of  zero  to  four.  Roughly,  pmning  anywhere  from  just  one  neighbor, 
up  to  about  ten  percent  of  the  DVS-specified  neighbors,  resulted  in  improved  prediction 
accuracy.  Monitoring  the  number  of  neighbors  pmned  during  algorithm  implementation 
allowed  verification  of  compliance  with  this  range.  The  only  data  set  for  which  prediction 
improvement  was  not  observed  was  financial  data  consisting  of  tick-by-tick  Swiss  franc 
exchange  rates.  This  data  set  is  examined  in  Section  3.8. 

3. 7  Pnming  by  Shared  Local  History 

Choosing  a  good  delay  time  r  for  predicting  a  chaotic  time  series  is  not  a  trivial  problem. 
Because  chaos  displays  sensitive  dependence  on  initial  conditions  (Appendix  A),  the  delay 
time  should  not  be  too  large.  On  the  other  hand,  if  the  data  is  from  a  finely  sampled  continuous 
time  system,  sampling  too  closely  will  result  in  overlapping  points  in  most  reasonably  sized 
embedding  spaces.  For  example,  if  a  sequence  contains  20  consecutive  identical  values 
because  of  oversampling,  then  an  embedding  space  of  dimension  19  has  at  least  two  repeated 
points  (assuming  r  =  1).  Furthermore,  if  a  time  series  is  quite  noise-corrrupted,  taking  r  too 
small  accentuates  the  effect  of  the  noise.  Fraser  and  Swinney  ( 1 3)  suggest  determining  a  good 
delay  time  from  the  mutual  information  curve  constructed  from  the  given  data,  although  their 
approach  was  not  taken  in  this  research. 

With  prediction  accuracy  the  ultimate  criterion  for  the  choice  of  r,  it  seems  reasonable  to 
choose  this  parameter  directly  from  multiple  runs  of  the  DVS  algorithm.  For  example,  once  an 
optimal  embedding  dimension  m  and  number  k  of  nearest  neighbors  has  been  obtained  using 
T  =  T  =  1,  the  algorithm  can  be  applied  several  more  times  with  varying  values  of  r  =  T 
and  with  the  optimal  m  fixed.  Exhaustive  search  of  the  resulting  data  can  reveal  which  value 
of  T  gives  best  results  for  the  selected  m.  Values  of  embedding  dimension  near  the  winning 
value  m  could  also  be  so  examined,  but  this  approach  was  not  applied  in  this  work.  For  the 
data  considered  in  this  work,  r  =  1  almost  always  yielded  greatest  accuracy.  This  is  probably 
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because  the  data  was  not  usually  given  in  oversampled  form,  and  any  r  >  2  necessarily 
involves  discarding  some  available  information.  Consider,  for  example,  the  data  represented 
by  the  plot  in  Figure  20.  This  plot  depicts  variations  in  the  blood  oxygen  concentration  of  a 


oxy  concentration 


seconds*2 


Figure  20.  Sleep  Apnea  Blood  Oxygen  Concentration 

sleep  apnea  patient.  It  was  taken  from  data  set  Bl.dat  of  the  Santa  Fe  Institute’s  time  series 
prediction  competition  (41 :4).  Lines  678 1  through  9480  of  file  B 1  .dat  (2700  data  points)  were 
chosen  because  these  values  correspond  to  a  lengthy  period  of  uninterrupted  stage  2  sleep. 
After  differencing  and  normalizing  this  data,  DVS  plots  were  made  to  determine  an  optimal 
embedding  dimension  m  and  number  k  of  nearest  neighbors.  Figure  21  shows  a  few  of  the 
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Figure  21.  Sleep  Apnea  DVS  Plots 
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embedding  dimensions  considered,  including  the  dimension  m  =  14  which  yielded  the  lowest 
noimalized  RMS  error  (atk  =  286).  A  commitment  is  made  to  the  dimension  m  =  14  and  a 
search  for  values  of  r  which  might  yield  lower  error  is  performed  by  examining  the  outputs  of 
the  DVS  algorithm  using  a  range  of  r  values  (in  this  case,  r  =  T  =  1  through  t  =  T  —  12). 
Table  2  shows  a  subset  of  this  range  (r  =  4  through  t  =  8);  no  value  of  t  (whether  displayed 


Table  2.  Selected  Error  Values  (Sleep  Apnea  Data,  m  =  14) 


mm 

r  =  4 

r  =  5 

r  =  6 

T  =  7 

r  =  8 

30 

0.7704 

1.0498 

0.8999 

0.9554 

1.0440 

31 

0.7553 

1.0100 

0.8867 

0.9621 

1.0181 

32 

0.7489 

0.9952 

0.8839 

0.9543 

1.0090 

34 

0.7514 

0.9220 

0.8594 

0.9161 

0.9836 

38 

0.7159 

0.8789 

0.7898 

0.8821 

0.9413 

46 

0.6706 

0.8036 

0.7713 

0.8007 

0.8919 

62 

0.6580 

0.7717 

0.7140 

0.7317 

0.8293 

94 

0.6259 

0.7213 

0.6856 

0.6912 

0.7316 

158 

0.6093 

0.6866 

0.6606 

0.6570 

0.6834 

286 

0.5885 

0.6622 

0.6474 

0.6491 

0.6595 

542 

0.6516 

0.6450 

0.6557 

0.6604 

1054 

0.5804 

0.6437 

0.6334 

0.6498 

0.6625 

2078 

0.5716 

0.6451 

0.6354 

0.6588 

0.6389 

in  Table  2  or  not)  yields  a  lower  value  of  normalized  RMS  error  than  the  0.4974  recorded  at 
T  =  1. 

Notice  in  Table  2  that  normalized  RMS  error  often  does  not  increase  monotonically 
with  r  for  a  fixed  value  of  k  (eg,  for  k  =  158).  It  was  discovered  in  the  course  of  this  work 
that  for  some  time  series,  for  a  fixed  m  and  fixed  k,  there  are  two  or  more  values  of  r  which 
give  particularly  low  error  values.  As  will  be  shown,  this  was  often  the  case  with  financial 
time  series.  A  technique  was  developed  to  combine  the  predictive  advantages  of  two  r’s 
corresponding  to  the  lowest  error  values,  and  it  was  often  able  to  increase  prediction  accuracy 
compared  to  DVS  prediction. 

Let  Ti  denote  the  smaller  of  the  two  optimal  t  values  and  let  tj  denote  the  larger. 
Prediction  with  this  technique  proceeds  basically  the  same  as  DVS  prediction,  using  DVS- 
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determined  values  of  m  and  k  and  with  r  =  ti  =  T  =  Ti.  Some  of  the  k  nearest  neighbors  are 
eliminated,  however,  before  the  prediction  hyperplane  is  determined.  The  neighbors  pmned 
correspond  to  intervals  in  time  which  do  not  overlap  intervals  relevant  to  prediction  using 
72.  Intuitively,  if  the  data  corresponding  to  a  particular  interval  of  time  are  not  prediction¬ 
relevant  to  both  short  term  and  long  term  prediction,  then  that  data  might  beneficially  be 
eliminated  from  the  set  of  prediction-determining  points  in  the  embedding  space.  Perhaps  the 
nonoverlapping  data  represents  noise. 

Consider  Figure  22,  for  example,  in  which  a  prediction  of  the  time  series  value 
at  time  iV  -1-  1  is  desired.  It  is  assumed  in  Figure  22  that  for  the  embedding  dimension 
m  =  2  and  some  number  k  of  nearest  neighbors,  ri  =  3  and  t2  =  5  have  been  found  optimal. 
Prediction  occurs  from  the  point  X(7v+i)-ti  in  the  n  embedding  space  using  neighbors  nearest 
X(;v+i)_Ti  in  the  fitting  set.  Not  all  of  the  fc  nearest  neighbors  are  used,  though.  Short  solid 
arrows  in  Figure  22  indicate  the  endpoints  of  intervals  in  the  fitting  set  corresponding  to  some 
of  the  k  points  in  the  ti  embedding  space  which  are  closest  to  X(Ar+i)-ri.  The  long  solid 
arrows  indicate  the  endpoints  of  an  interval  corresponding  to  one  of  the  k  nearest  neighbors 
to  the  point  (a:(/v+i)-2T2>®(/v+i)-T2)  in  its  embedding  space  (the  T2  embedding  space  is 
not  illustrated).  It  is  assumed  that  none  of  the  intervals  of  length  five  corresponding  to  the  k 
nearest  neighbors  of  {x^n+i)-2T2  >  ^(n+i)-T2)  overlap  the  interval  [r  -  ri ,  r] .  Since  there  is  no 
overlap,  the  point  x^  is  eliminated  from  the  set  of  points  which  will  determine  the  prediction 
hyperplane.  On  the  other  hand,  the  interval  [g  —  Tx,q]  does  overlap  the  interval  [p  —  72,p], 
so  the  point  x,  is  retained.  Care  is  taken  that  at  least  2(m  -|- 1)  nearest  neighbors  are  retained 
regardless  of  overlap,  and  local  linear  prediction  proceeds  as  described  in  Section  3.4. 

The  algorithm  just  described  will  be  referred  to  as  overlap  prediction.  A  number  of 
variations  of  it  come  readily  to  mind.  There  is  no  inherent  need  to  combine  only  two  optimal 
r’s,  and  various  schemes  can  be  devised  to  eliminate  neighbors  based  on  complete  or  partial 
isolation  of  prediction-relevant  intervals  when  intervals  of  several  different  lengths  are  used. 
Even  in  the  case  of  only  two  optimal  r’s,  some  benefit  might  be  gained  by  requiring  strict 
inclusion  of  smaller  intervals  (as  illustrated  in  Figure  22)  rather  than  merely  overlap.  Never- 
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time  series  indices  (here  m  =  2,  Xj  =  Tj  =  3  ,and  X2  =  T2  =  5  ) 


Figure  22.  Overlap  Prediction 


theless,  overlap  prediction  as  presented  above  often  gives  prediction  accuracy  improvement 
for  many  time  series.  Table  3  compares  DVS  predictions,  pmned  outliers  predictions,  and 


Table  3.  Selected  Predicted  Values  (Sleep  Apnea  Data,  m  =  14) 


ts  index 

tme  val 

DVS  preds 

pmned  outliers  preds 

overlap  preds 

2671 

1.4624 

0.4120 

0.3051 

0.2894 

2672 

0.3710 

1.0757 

1.1453 

0.1562 

2673 

1.3210 

0.6858 

0.7350 

0.3807 

2674 

0.9370 

0.7490 

0.6667 

0.4318 

2675 

0.8359 

0.7923 

0.7386 

0.6121 

2676 

0.3104 

0.6682 

0.4319 

-0.0132 

2677 

0.9167 

0.8379 

0.8287 

0.5865 

2678 

-0.1342 

0.5722 

0.4713 

-0.0879 

2679 

0.0477 

0.8037 

0.6235 

0.6758 

2680 

0.0679 

0.0634 

RMS  error 

0.5737 

0.5621 

0.5676 

overlap  predictions  for  the  sleep  apnea  time  series.  In  applying  DVS  prediction,  the  number  k 
of  nearest  neighbors  used  was  286,  and  r  ani  T  were  set  to  one.  In  applying  pmned  outliers 
prediction,  the  same  values  of  fc,  r  and  T  were  used,  end  the  'parameter  7  which  determines 
the  exclusion  region  was  set  to  two.  A  smaller  number  of  nearest  neighbors  was  used  with 
overlap  prediction;  here  A:  =  158  was  used  to  avoid  swamping  the  available  time  with  candi¬ 
date  intervals  (thereby  making  pmning  unlikely  to  occur).  Although  a  wider  range  of  r’s  was 
considered  than  is  shown  in  Table  2,  this  table  nonetheless  illustrates  why  the  values  of  n  and 
Tj  were  chosen  as  four  and  seven  respectively. 


3. 8  Some  Other  Time  Series 

The  pmned  outliers  and  overlap  prediction  algon*hms  yielded  improvement  over  DVS 
prediction  for  many,  but  not  all,  types  of  time  series  The  laser  data  discussed  in  Section 
3.5  was  differenced  and  normalized  (as  described  in  Section  3.4),  then  analyzed  using  the 
three  algorithms.  Data  generated  by  the  DVS  algorithm  (with  Nj  =  560  and  Nt  =  439) 
revealed  an  optimal  embedding  dimension  of  seven  and  number  of  nearest  neighbors  A:  =  17. 
Further  mns  of  this  algorithm  with  m  fixed  at  seven  and  with  varying  delay  times  t  produced 
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data  from  which  a  number  48  of  nearest  neighbors  was  selected  for  overlap  prediction  using 
Ti  =  4  and  t2  =  7.  For  the  pruned  outliers  alg  rithm,  the  exclusion  parameter  7  was  taken  to 
be  2.0.  Ten  data  values  beyond  those  available  to  the  Santa  Fe  contestants  were  utilized  for 
purposes  of  comparing  predictions  (these  values  being  the  first  ten  points  in  the  continuation 
file  A.cont).  The  results  are  presented  in  Table  4.  Recall  that  at  least  2(m  +  1)  neighbors  are 

Table  4.  Selected  Predicted  Values  (Santa  Fe  Laser  Data,  m  =  7) 


ts  index 

true  val 

DVS  preds 

pruned  outliers  preds 

1000 

1.0807 

1.0734 

1.0734 

1.1939 

1001 

2.3363 

2.2353 

2.2437 

2.0421 

1002 

'  ,2321 

-1.0735 

-0.8828 

1003 

-1.8929 

-1.8868 

-1.8261 

1004 

-0.4832 

-0.4743 

-0.4706 

-0.5686 

1005 

-0.0647 

-0.0745 

-0.0747 

-0.0662 

1006 

0.0675 

0.0743 

0.0743 

-0.1847 

1007 

0.3979 

0.4200 

0.4200 

0.4634 

1008 

1.7195 

1.6223 

1.7646 

1.7197 

1009 

1.7195 

1.7033 

1.7405 

1.7267 

RMS  error 

0.0677 

0.0609 

0.1735 

always  retained  for  determining  the  prediction  hyperplane;  in  this  case,  16.  Thus  the  pruned 
outliers  predictor  can  only  prune  at  most  one  neighbor.  In  fact,  with  7  =  2.0,  it  only  pruned 
one  neighbor  in  six  of  the  ten  predictions;  the  remaining  four  predictions  used  all  17  nearest 
neighbors  and  were  therefore  identical  with  the  DVS  predictions.  Recognizing  the  lack  of 
“orunability”  allowed  by  the  use  of  the  DVS  optimal  k  =  17,  a  larger  number  (48)  of  nearest 
neighbors  was  used  with  overlap  prediction.  Unfortunately,  normalized  RMS  errors  escalate 
quickly  as  the  number  of  nearest  neighbors  increases  from  17.  From  =  17  to  fc  =  48,  there 
is  an  increase  of  normalized  RMS  error  from  0.21  to  0.34  for  r  =  1;  similar  increases  exist 
for  all  examined  t’s.  As  Table  4  reveals,  overlap  prediction  cannot  overcome  the  inherent 
loss  of  prediction  accuracy  associated  with  using  too  many  neighbors. 

The  laser  data  just  examined  is  presumed  to  have  relatively  few  degrees  of  freedom 
associated  with  its  generation.  Its  DVS  plots  reveal  that  it  is  nearer  the  D  (deterministic)  than 
the  S  (stochastic)  extreme.  On  the  other  hand,  certain  types  of  financial  data  have  higher 
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numbers  of  nearest  neighbors  associated  with  their  optimal  prediction,  even  relative  to  the 
‘'.ometimes  larger  embedding  dimensions.  Accordingly,  their  DVS  plots  reveal  more  stochastic 
behavior,  or  higher  dimensional  dynamics.  Such  time  series  are  less  accurately  predictable 
than  those  coming  more  from  the  deterministic  extreme,  but  it  is  easier  to  enhance  whatever 
predictability  there  is  using  the  pmned  outliers  and  overlap  prediction  algorithms. 

Figure  23  shows  daily  opening  bids  for  Standard  and  Poor’s  500  futures  contracts  for 


S  and  P  value 


Figure  23.  Standard  and  Poor’s  Bid  Prices 


the  period  from  20  October  1988  to  26  February  1993;  it  contains  a  total  of  1 100  data  points. 
These  data  were  differenced,  then  normalized  to  zero  mean  and  unit  variance  (as  described  in 
Section  3.4).  The  resulting  time  series  (containing  1099  data  points)  is  shown  in  Figure  24. 
This  time  series  was  then  processed  by  the  DVS  algorithm  (using  Nj  =  990  and  Nt  =  90) 


Differenced  and  normalized  S  and  P  values 


Figure  24.  Differenced  S  and  P  Data,  Normalized  to  Zero  Mean  and  Unit  Variance 
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to  establish  optimal  values  of  embedding  dimension  and  number  of  nearest  neighbors.  These 
were  determined  to  be  m  =  12  and  k  =  154  respectively.  These  values  of  m  and  k  were 
also  used  in  pruned  outliers  prediction,  with  the  exclusion  parameter  7  =  4.0  (see  Section 
3.6).  For  overlap  prediction,  the  smaller  value  m  =  3  was  chosen  to  avoid  excessive  overlap 
of  intervals;  h^  m  =  12  been  used  in  overlap  prediction  with  a  delay  time  of  r  =  11,  for 
example,  then  each  interval  would  extend  over  12  x  (11  —  1)  =  120  time  series  values. 
Since  there  are  only  1099  time  series  values  available,  the  chances  of  pruning  would  be  quite 
small  for  any  moderate  number  of  nearest  neighbors.  The  value  m  =  3  was  chosen  not  only 
because  3  is  small  but  also  because  the  normalized  RMS  errors  associated  with  prediction 
using  m  =  3  (0.69  at  A:  =  72,  for  example)  are  only  slightly  higher  than  those  associated 
with  prediction  using  m  =  12  (0.67  at  A:  =  72).  Having  chosen  to  use  m  =  3  with  overlap 
prediction,  a  good  number  of  nearest  neighbors  and  corresponding  good  delay  times  ri  and 
t2  were  selected.  Table  5  shows  some  of  the  t  values  considered,  and  illustrates  why  the 
values  A:  =  72,  Ti  =  3  and  tj  =  11  were  chosen.  Some  results  of  predicting  with  the  various 

Table  5.  Selected  Error  Values  (Standard  and  Poor’s  Data,  m  =  3) 


k 

T  =  7 

|UII 

8 

0.9829 

0.8932 

0.8514 

1.0856 

0.9397 

9 

0.8435 

0.8607 

0.8608 

0.9934 

0.8805 

10 

0.8377 

0.8400 

0.8035 

0.8848 

0.8347 

12 

0.8498 

0.8275 

0.7610 

0.8634 

0.8121 

16 

0.7797 

0.7688 

0.7285 

0.8108 

0.7290 

24 

0.7098 

0.7153 

0.7258 

0.7886 

0.7217 

40 

0.7005 

0.7036 

0.7029 

0.7545 

0.6807 

72 

0.6878 

0.7007 

0.6953 

0.7155 

0.6432 

136 

0.6914 

0.7021 

0.6889 

0.7049 

0.6478 

264 

0.6905 

0.6813 

0.6751 

0.6951 

0.6409 

520 

0.6864 

0.6865 

0.6737 

0.6868 

0.6475 

algorithms  are  presented  in  Table  6. 

Figure  25  shows  daily  opening  prices  for  the  British  pound  for  the  period  from  19 
December  1988  to  27  April  1993.  It  contains  1101  data  points.  This  data  was  differenced, 
then  normalized  to  zero  mean  and  unit  variance.  The  resulting  time  series  (containing  1100 
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Table  6.  Selected  Predicted  Values  (Standard  and  Poor’s  Data) 


ts  index 

true  val 

DVS  preds 

pruned  outliers  preds 

overlap  preds 

1081 

-0.2339 

-0.4028 

-0.3632 

0.1268 

1082 

0.7550 

0.1413 

0.0693 

0.1756 

1083 

0.4932 

0.2114 

0.1757 

0.3174 

1084 

1.5984 

-0.2228 

-0.0974 

0.1111 

1085 

-0.1030 

0.1475 

0.0605 

-0.0095 

1086 

0.0424 

0.1419 

0.2559 

0.0398 

1087 

-0.5829 

-0.0624 

-0.0624 

-0.2704 

1088 

-0.3211 

-0.2619 

0.0126 

-0.0857 

1089 

0.2896 

0.1025 

0.1926 

-0.0146 

1090 

0.1006 

0.0146 

0.0484 

0.0749 

RMS  error 

0.6474 

0.6269 

0.5444 

bp-open  value 


Figure  25.  British  Pound  Opening  Bids 

data  points)  was  then  processed  by  the  DVS  algorithm  (using  Nj  =  990  and  Nt  =  90)  to 
establish  optimal  values  of  embedding  dimension  and  number  of  nearest  neighbors.  These 
were  found  to  be  m  =  6  and  k  =  270  respectively.  These  values  of  m  and  k  were  also 
used  in  pruned  outliers  prediction,  but  now  with  the  smaller  exclusion  parameter  7  =  1.0. 
The  embedding  dimension  m  =  6  was  also  used  for  overlap  prediction;  this  value  of  m  is 
considerably  smaller  than  the  value  m  =  12  found  optimal  for  the  Standard  and  Poor’s  data 
and  therefore  alleviates  excessive  overlap  caused  by  very  long  candidate  intervals.  Examining 
the  DVS  data  for  a  range  of  t  values  from  1  to  14  resulted  in  the  selection  of  fc  =  46  as 
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the  baseline  number  of  candidate  intervals  of  lags  ti  =  3  and  t2  =  14.  Some  prediction 
results  using  these  parameters  are  presented  in  Table  7.  Although  Table  7  reflects  the  use  of 


Table  7.  Selected  Predicted  Values  (British  Pound  Data) 


ts  index 

true  val 

DVS  preds 

pruned  outliers  preds 

overlap  preds 

1081 

-0.1036 

0.0986 

0.0962 

0.2003 

1082 

1.0478 

0.1116 

0.1125 

-0.1885 

1083 

0.4289 

0.0594 

-0.0188 

-0.0684 

1084 

1.5371 

0.2028 

0.2133 

-0.1458 

1085 

-1.0246 

-0.0584 

-0.1294 

-0.5622 

1086 

-0.2187 

0.2834 

0.3007 

0.2154 

1087 

0.1411 

-0.1998 

-0.0121 

-0.0080 

1088 

0.0835 

0.0571 

0.0694 

-0.3890 

1089 

1.3788 

-0.0694 

-0.0598 

0.3855 

1090 

1.2205 

0.0858 

0.0930 

0.2508 

RMS  error 

0.8673 

0.8546 

0.8529 

the  exclusion  parameter  7  =  1.0  in  pruned  outliers  prediction,  it  was  found  that  several  other 
small  positive  choices  of  7  also  yielded  accuracy  improvement  over  DVS  prediction. 

Figure  26  displays  2000  values  of  consecutively  obtained  spatial  Fourier  magnitude 


M6O  trajectoiy  14  th  component 


500  1000  1500  2000 


time 


Figure  26.  Fourteenth  Fourier  Component  Data  from  an  Apparently  Moving  M60  Tank 


components  derived  from  an  M60  tank  located  at  the  center  of  a  viewing  sphere  (see  Chapter 
rV).  The  tank  was  viewed  while  the  observer  was  following  a  Lorenz-like  viewing  sphere 
traversaL  and  the  time  series  represents  fourteenth  Fourier  components.  As  usual,  these 
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data  were  differenced,  then  normalized  to  zero  mean  and  unit  variance.  The  resulting  time 
series  was  then  processed  by  the  DVS  algorithm  (using  Nj  =  1820  and  Nt  =  160)  to 
establish  optimal  values  of  embedding  dimension  and  number  of  nearest  neighbors.  These 
were  determined  to  be  m  —  9  and  k  =  36  respectively.  These  values  of  m  and  k  were  used 
in  both  DVS  and  pruned  outliers  prediction.  The  exclusion  parameter  7  =  2.0  was  selected 
for  pruned  outliers  prediction.  A  smaller  value  of  m  than  9  was  sought  for  overlap  prediction 
(to  avoid  excessively  long  candidate  intervals).  DVS  analyses  revealed  low  prediction  errors 
associated  with  prediction  using  m  =  5,  k  =  28,  n  =  3,  and  T2  =  5  (relative  to  neighboring 
values  of  these  parameters),  so  these  values  were  fixed  for  use  in  overlap  prediction.  Some 
prediction  results  using  these  parameters  are  given  in  Table  8. 

Table  8.  Selected  Predicted  Values  (M60  Tank  Data) 


ts  index 

true  val 

DVS  preds 

pruned  outliers  preds 

overlap  preds 

1981 

-0.0860 

-0.0022 

-0.0099 

-0.1011 

1982 

-0.1330 

-0.0197 

-0.0400 

-0.1038 

1983 

0.0800 

-0.0902 

-0.1256 

-0.0989 

1984 

-0.0964 

-0.0207 

-0.0824 

1985 

-0.0072 

-0.0272 

0.0454 

-0.1602 

1986 

-0.2905 

-0.1941 

-0.1920 

-0.1538 

1987 

-0.1957 

-0.1078 

-0.0642 

-0.1997 

1988 

-0.0885 

-0.2379 

-0.1915 

-0.1558 

1989 

-0.3078 

-0.2088 

-0.2951 

-0.3862 

1990 

-0.1371 

-0.3029 

-0.3035 

-0.1956 

RMS  error 

0.1147 

0.1118  0.0958 

It  may  be  that  both  pruned  outliers  and  overlap  prediction  (with  fortuitous  choices  of 
parameters)  improve  upon  DVS  prediction  by  eliminating  noise .  Not  all  noisy  data  is  equally 
amenable  to  prediction  improvement,  however.  The  time  series  illustrated  in  Figure  27  is  a 
plot  of  tick-by-tick  quotes  of  Swiss  franc  exchange  rates,  over  a  period  of  about  two  and  one 
half  weeks  in  late  1 990  (3000  data  points).  It  represents  data  set  C2.dat  of  the  Santa  Fe  Institute 
competition.  The  values  of  this  data  almost  always  change  in  multiples  of  0.0005,  and  seven 
or  more  contiguous  identical  values  in  the  data  are  not  uncommon.  For  this  reason,  direct 
embedding  of  the  data  results  in  considerable  overlap  in  all  but  very  high  dimensional  spaces. 
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Swiss  franc  exchange  rate 


time 


Figure  27.  Plot  of  Santa  Fe  Institute  Swiss  Franc  Data  Set  C2.dat 

To  alleviate  this  problem,  the  time  series  was  sampled  at  every  second  point,  creating  a  time 
series  with  values  spaced  roughly  (but  variably)  seven  minutes  apart.  The  resultant  series  was 
differenced,  then  normalized,  giving  a  time  series  with  1499  data  points.  The  DVS  algorithm 
was  f^plied  using  m  values  from  7  to  14,  with  Nj  =  1300  and  Nt  =  170.  Analysis  of  the 
resulting  data  indicated  best  prediction  accuracies  (such  as  they  were)  at  quite  high  values  of 
k.  TTie  DVS  plots  for  m  =  7,  m  =  8,  and  m  =  9  are  shown  in  Figure  28.  Their  trailing 
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Figure  28.  Swiss  Franc  DVS  Plots 

tails  and  high  error  values  (always  above  0.82)  indicate  data  near  the  stochastic  (as  opposed  to 
deterministic)  extreme  (4).  In  other  words,  these  data  are  likely  either  very  noisy  or  generated 
by  a  dynamical  system  having  a  very  large  number  of  degrees  of  freedom.  For  the  range  of 
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m’s  examined,  the  least  error  was  found  to  otxur  at  m  =  8  with  k  =  1042.  These  were  the 
values  used  with  DVS  prediction.  They  were  also  used  with  pmned  outliers  prediction,  where 
the  exclusion  parameter  7  =  1.0  was  chosen  again.  With  overlap  prediction,  m  =  8  was 
retained,  but  k  =  1042  was  so  large  that  many  potentially  valuable  close  neighbors  would 
likely  be  eliminated  by  the  overwhelming  number  of  candidate  neighbors,  even  by  neighbors 
which  were  quite  distant  and  perhaps  error-accentuating.  Therefore,  when  a  range  of  values 
of  T  were  examined  (from  t  =  1  through  t  =  16),  the  smaller  value  k  =  82  was  selected, 
along  with  ri  =  4  and  ra  =  9.  A  comparison  of  the  three  methodologies  is  presented  in 
Table  9.  Notice  that  all  of  the  methodologies  produce  poor  predictions  (recall  that  RMS  errors 


Table  9.  Selected  Predicted  Values  (Swiss  Franc  Data) 


ts  index 

true  val 

DVS  preds 

pruned  outliers  preds 

overlap  preds 

1471 

-1.3745 

-0.0367 

-0.0049 

-0.1999 

1472 

0.0047 

-0.0506 

-0.0876 

0.4973 

1473 

-0.9148 

-0.0369 

-0.0480 

-0.0314 

1474 

-0.4550 

-0.0665 

-0.0690 

-0.0301 

1475 

0.9242 

-0.0487 

0.0352 

0.2161 

1476 

-1.3745 

-0.0961 

-0.0148 

-0.2729 

1477 

0.0047 

-0.0859 

-0.1370 

0.1076 

1478 

-0.9148 

-0.0612 

-0.0889 

0.6342 

1479 

-1.3745 

-0.1714 

0.0346 

-0.0026 

1480 

0.9242 

-0.0476 

-0.0034 

0.1306 

RMS  error 

0.9178 

0.9472 

0.9602 

of  unity  result  from  merely  predicting  the  average  data  value).  Several  different  choices  of 
parameters  7  were  tried  with  pruned  outliers  prediction,  as  were  several  different  choices  of 
parameters  k,  ti,  and  ra  with  overlap  prediction.  None  of  these  choices  improved  prediction 
accuracy  over  DVS  prediction.  This  is  not  to  say  that  no  choices  could  improve  accuracy. 
Rather,  the  relative  difficulty  of  improving  on  DVS  prediction  accuracy  with  the  Swiss  franc 
data  suggests  it  possesses  a  very  high  degree  of  randomness  and  may  not  be  a  suitable  subject 
for  local  linear  prediction.  Mozer  was  able  to  attain  a  slightly  lower  RMS  error  (0.859)  for 
15-minute-ahead  predictions  using  a  neural  network  (26:260). 
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3.9  Conclusion 


The  DVS  algorithm  has  been  presented  in  some  detail,  and  its  modification  to  a  DVS  pre¬ 
diction  algorithm  has  been  described.  Two  extensions  of  this  algorithm  have  been  developed, 
and  some  results  of  their  implementations  have  been  presented. 

The  first  algorithm,  pruned  outliers  prediction,  excludes  from  the  DVS-determined 
optimal  number  of  neighbors  nearest  the  prediction  point,  those  which  lie  farthest  from 
the  linear  prediction  hyperplane.  A  new  hyperptane  is  calculated  based  on  the  remaining 
neighbors,  and  the  value  assumed  on  this  hyperplane  at  the  prediction  point  becomes  the 
predicted  value.  This  algorithm  was  able  to  predict  more  accurately  than  DVS  prediction  for 
a  number  of  time  series  examined.  For  example,  prediction  improvement  was  noted  using 
Standard  and  Poor’s  financial  data,  where  an  RMS  error  value  of  0.627  was  found  with  pmned 
outliers  prediction,  as  compared  to  0.647  with  DVS  prediction. 

The  second  algorithm,  overlap  prediction,  endeavors  to  combine  the  benehts  of  using 
one  short  and  one  longer  time  delay.  Prediction  is  based  on  the  shorter  of  the  two  time  delays, 
but  intervals  of  time  are  allowed  to  be  represented  in  the  utilized  set  of  nearest  neighbors  only 
if  they  play  a  role  in  both  short  term  and  long  term  prediction.  This  algorithm  was  also  able 
to  predict  more  accurately  than  DVS  prediction  for  some  time  series  examined,  particularly 
financial  time  series.  Prediction  of  the  Standard  and  Poor’s  data  gave  an  RMS  error  of  0.544, 
better  than  either  of  the  other  algorithms 

Both  algorithms  were  therefore  shown  capable  of  improving  prediction  accuracy  over 
DVS  prediction  for  some  (inic  seilcs  Other  time  seric:  seemed  less  susceptible  to  im¬ 
provement.  Numerous  experiments  with  Swiss  franc  exchange  rate  data  yielded  decreased 
prediction  accuracy  compared  to  DVS  prediction.  In  one  typical  experiment,  the  RMS  errors 
were  0.918, 0.947,  and  0.960  for  DVS,  pruned  outliers,  and  overlap  predictions,  respectively. 
It  is  suspected  that  the  decreased  susceptibility  to  improvement  with  Swiss  franc  data  may 
indicate  that  this  data  is  more  stochastic  (as  implied  by  Figure  28)  than  most  of  the  other  time 
series  examined. 
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A  total  of  60  predictions  have  been  presented  (ten  for  each  of  six  time  series)  comparing 
the  DVS,  pruned  outliers,  and  overlap  prediction  methods.  The  total  RMS  errors  associated 
with  all  of  these  60  predictions  are  0.627  for  DVS  prediction;  0.626  for  pruned  outliers 
prediction;  and  0.620  for  overlap  prediction.  When  the  predictions  associated  with  the  highly 
stochastic  Swiss  franc  data  are  excluded,  the  50  remaining  predictions  reveal  total  RMS  errors 
of  0.551  for  DVS  prediction;  0.539  for  pruned  outliers  prediction;  and  0.526  for  overlap 
prediction.  These  numbers  suggest  that  the  two  algorithms  newly  developed  in  this  research 
may  offer  prediction  advantage  for  time  series  which  are  not  highly  stochastic. 

A  drawback  to  both  pmned  outliers  and  overlap  predictions  is  that  they  require  selection 
of  parameter(s)  which  may  not  always  enhance  accuracy. 
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IV,  Spatio-Temporal  Classification  Based  on  Fractal  Dimension 

4.1  Introduction 

This  chapter  presents  an  application  of  the  embedding  theorem  to  the  problem  of  moving 
object  classification.  In  this  problem,  shape  feature  vectors  of  several  classes  of  objects  are 
identified.  As  these  objects  move  through  space  (or  appear  to  move  through  space),  sequences 
of  views  are  taken  and  translated  into  sequences  of  feature  vectors.  The  evolutions  of  the 
individual  components  of  these  feature  vectors  comprise  a  set  of  time  series.  Depending  on 
how  the  views  are  obtained,  these  time  series  may  reveal  an  underlying  fractal  nature  to  the 
dynamical  systems  defined  by  the  evolving  feature  vectors.  It  is  shown  that  the  embedding 
theorem  may  be  applied  to  the  highest  of  the  fractal  dimensions  corresponding  to  the  objects 
to  set  a  reasonable  length  for  the  test  sequences  presented  for  classification. 

The  following  section  will  describe  an  experimental  method  of  obtaining  views  of 
objects.  In  Section  4.3,  a  proof  is  given  that  a  monotonic  mapping  of  a  time  series  yields 
another  time  series  with  the  same  fractal  dimension.  Section  4.4  considers  an  example 
classification  problem  using  feature  vectors  consisting  of  Fourier  components  of  the  viewed 
objects.  A  nearest  neighbor  spatio-temporal  classifier  is  described  in  Section  4.5,  and  some 
results  of  its  application  to  the  example  classification  problem  are  presented.  Section  4.6 
discusses  strategies  for  viewing  sphere  traversals,  pointing  out  that  chaotic  trajectories  may 
not  be  necessary  for  fractal  dimension-based  classification. 

4.2  Creating  Spatio-Temporal  Training  Sequences 

Building  on  earlier  work  by  Rabiner  and  others  (30)  (34),  Capt  Ken  Fielding  has 
implemented  a  Hidden  Markov  Model  spatio-temporal  classifier  (12, 10, 11,9).  The  sequences 
of  views  he  uses  are  obtained  by  traversing  (in  computer  simulation)  a  portion  of  the  surface 
of  a  sphere  having  at  its  center  a  three-dimensional  model  of  the  object  of  interest.  The  portion 
of  the  surface  traversed  will  be  called  the  viewing  quadrant,  although  it  doesn’t  quite  fill  a 
quadrant  of  the  sphere.  Fielding’s  technique  can  be  applied  to  sets  of  data  corresponding  to 
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aibitrary  traversals  simulating  a  wide  range  of  possible  object  motions.  Five  objects  (models 
of  military  land  vehicles)  are  positioned  at  the  center  of  the  sphere  and  provide  hve  classes 
for  recognition.  His  Hidden  Markov  Models  can  be  trained  on  sequences  of  feature  vectors 
extracted  from  lengthy  training  sequences  of  views.  Their  performance  is  then  judged  using 
short  test  sequences  of  such  views  which  are  typically  trajectories  near,  but  not  entirely 
overl^ping,  the  training  trajectories. 

One  might  wonder  how  best  to  assemble  training  sequences.  Perhaps  a  region-filling 
curve  composed  of  segments  found  most  typical  of  actual  vehicle  movements  (derived,  for 
example,  from  observations  made  during  training  exercises)  might  most  accurately  represent 
real-world  scenarios.  Even  if  it  is  practical,  however,  to  train  with  sequences  containing 
subsequences  near  every  possible  test  sequence,  some  critical  questions  have  to  be  answered. 
How  is  “nearness”  measured?  What  minimum  length  test  sequence  should  be  used? 

The  problem  of  constmcting  an  optimal  training  trajectory  is  not  addressed  here.  Instead, 
simplifying  assumptions  are  made  on  the  vehicle-viewer  orientations  to  produce  training 
trajectories  on  the  viewing  sphere,  and  the  fractal  dimensions  of  the  derived  feature  vector 
trajectories  are  then  exploited  to  infer  a  sufficient  lower  bound  for  test  sequence  lengths. 

Suppose,  for  example,  that  a  training  trajectory  is  chosen  which  has  a  known  fractal 
nature.  Is  it  possible  to  produce  a  sequence  of  feature  vectors  which  derives  a  fractal  nature 
directly  from  that  of  the  training  trajectory?  At  least  for  a  simple  feature  extracted  from  a 
simple  object,  the  answer,  it  will  be  shown,  is  yes.  But  first  a  necessary  digression. 

4.3  Time  Series  Having  Identical  Fractal  Dimension 

The  two  time  series  shown  in  Figure  29  have  the  same  fractal  dimension;  a  new  theorem 
will  be  proven  in  this  section  to  verify  this  assertion.  The  reader  who  chooses  not  to  skip  this 
development  may  wish  to  consult  Appendix  A  for  some  basic  concepts  and  terminology. 

Figure  30  shows  two  disjoint  fractal  objects  (differently  warped  nested  triangles  in  the 
upper  left  and  lower  right).  Barnsley  provides  a  theorem  from  which  it  may  be  deduced  that 
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Figure  29.  Stretching  a  Hme  Series 


these  objects  have  the  same  fractal  dimensions.  This  theorem  may  also  be  used  to  demonstrate 
that  the  two  time  series  of  Figure  29  have  the  same  fractal  dimensions. 

In  both  Figure  29  and  Figure  30,  the  right-side  object  is  the  image  of  the  left-side  object 
under  a  mapping  known  as  a  metric  equivalence. 

Definition:  Two  metric  spaces  {Xi,di)  and  (X2,  da)  are  metrically  equivalent  if  there 
is  a  bijective  mapping  h  :  Xi  X2  and  constants  0  <  ci  <  ca  <  00  such  that  for  all  x  and 
y  in  Xi, 

cidi{x,y)  <  d2{h{x),h{y))  <  cadi(x,j/) 

In  this  context,  the  mapping  h  will  be  called  a  metric  equivalence, 

A  metric  equivalence  may  be  thought  of  as  a  function  which  stretches  or  compresses 
one  space  into  another,  but  in  such  a  way  that  distances  between  points  are  neither  increased 
nor  decreased  unboundedly.  The  linear  mapping  /  :  (0, 1]  [0,2],/(x)  =  2x,  is  an  example 

of  a  metric  equivalence  between  the  metric  spaces  Xi  =  [0, 1]  and  X2  =  [0, 2],  where  both 
metrics  di  and  da  are  assumed  Euclidean. 

Barnsley  provides  a  proof  of  the  following  theorem  (2:180). 

Theorem  1.  Let  (Xi,  di)  and  (Xa,  da)  be  metrically  equivalent  metric  spaces  with  a  metric 
equivalence  h  •.  X\  -*  X2.  Suppose  a  nonempty  compact  subset  Ai  of  Xi  has  fractal 
dimension  D.  Then  its  image  h(Ai)  also  has  fractal  dimension  D. 
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Figure  30.  Two  Objects  in  with  Same  Fractal  Dimension  (2:173) 

There  is  a  metric  equivalence  h  :  (2, 3)  -*■  (4, 9)  between  a  pair  of  Euclidean  spaces 
which  contain  the  two  time  series  illustrated  in  Figure  29,  namely,  h{x)  =  i^.  It  is  because 
this  metric  equivalence  extends  to  the  embedding  spaces  in  which  the  fractal  dimensions  of 
the  time  series  are  computed  that  the  equivalence  of  their  dimensions  may  be  asserted. 

Theorem  2.  Suppose  Xi  €  (a, 6)  fori  =  1,2,3,...  and  the  time  series  {x,}  has  fractal 
dimension  d.  Suppose  g  :  {a,b)  ^  is  one-to-one  and  differentiable,  and  satisfies 

0<a<  ^  </3 

for  some  real  constants  a  and  ^  and  for  all  z  €  (a,  b).  Then  the  time  series  also  has 

fractal  dimension  d. 

Proof:  By  the  embedding  theorem,  there  is  an  integer  k  >  2d  such  that,  in  the  space 
(a,  6)*  with  Euclidean  metric,  the  points  x<  =  (x,_jt+i,  Xj_fc+2,  •  •  • ,  i  >  fc,  lie  on  an 
attractor  having  fractal  dimension  d.  Define  a  vector  function  g  :  ((a,  6)*',  Euclidean)  — » 
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([5f(a,  i)]*,  Euclidean)  by 


yi  9{yi) 

y2  9(92) 

•  • 

•  • 

^yk )  9iyk) 

for  all  y  =  (yi,  ya, . . . ,  yk)^  in  (a,  6)*.  It  will  be  shown  that  g  provides  an  equivalence  of  the 
naetric  spaces  (a,  6)*  and  [y(a,  6)]*=  (both  with  assumed  Euclidean  metrics).  Theorem  1  then 
proves  the  assertion  about  {y(a:.)}. 

Let  y  =  (yi,  ya, . . . ,  y*)^  and  z  =  (zi,  ^2,  •  •  • ,  Zk^  be  any  elements  of  (a,  6)*,  and 
let  i  G  {1,2, . . . ,  k}.  By  the  mean  value  theorem  of  calculus,  there  is  a  number  c,  in  (y,-,  2,) 
(or  in  {zi,  y,)  if  2,  <  y„  or  equal  to  their  common  value  if  y,  =  2,)  such  that 

9{zi)  -  y(y.)  =  y  (ci)(2.-  -  y.) 

By  hypothesis,  a  <  |y'(c.)|  <  0,  and  since  |y(2,)  -  y(y<)|  =  |y'(ci)|k<  -  y,|, 

-  yil  <  \9{zi)  -  y(yi)l  <  0\zi  -  y^j 

-  !/<l*  <  \9{zi)  -  giViW  <  ~  y.f 

This  is  true  for  every  *  €  {1,2, . . . ,  jfc}  so 

a’  H  \^i  -  ^  -  J/’f 

«=i  t=i  ,=i 

/k  \V2  /fc  ,1/2  1/2 

« (^E  l^‘  -  y*f  j  <  ^  ^  (E  I^«  -  y«l"j 

allz  -  y||  <  ||g(z)  -  g(y)|l  <  j3\\z  -  y|| 

Thus  g  is  indeed  a  metric  equivalence.  ■ 
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Suppose  now  that  the  x-component  of  a  viewing  quadrant  traversal  forms  a  time  series 
3^1)  3^2,  X3, . . .  with  a  known  fractal  dimension.  Suppose  also  that  measurements  are  taken  of 
an  object  at  the  center  of  the  viewing  sphere  from  each  viewing  quadrant  location,  and  those 
n^asurements  are  interpreted  for  classification  purposes  as  representing  a  feature  of  the  object. 
If  that  feature  changes  strictly  monotonically  with  increasing  x  but  not  at  all  with  other  viewing 
location  coordinates,  then  Theorem  2  implies  that  the  sequence  of  feature  vectors  must  have 
the  same  fractal  dimension  as  the  time  series  xi,  X2,  X3, . . ..  Toward  the  goal  of  realizing  a 
feature  space  trajectory  with  a  known  fractal  dimension,  the  next  section  will  consider  a  novel 
viewing  quadrant  training  trajectory. 

4.4  A  Loren?  Traversal  of  the  Sphere  (An  Embedology  Perspective) 

Consider  a  scaled  Lorenz  attractor  confined  within  3-space  to  the  viewing  quadrant  of 
interest,  where  the  stationary  object  is  centered  at  the  origin  and  the  x  and  y  axes  determine 
the  equatorial  plane;  see  Figure  31.  Its  evolving  x  components  generate  a  fractal  times 


viewing 

quadrant 


^  y 


X 

Figure  31.  Viewing  Sphere 


series.  Let  P  denote  the  projection  of  the  confined  attractor  onto  the  viewing  quadrant  surface 
(it  does  not  extend  to  the  edges  of  the  surface,  and  leaves  large  portions  of  the  viewing 
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quadrant  unexplored).  Now  suppose  that  the  object  of  interest  at  the  center  of  the  sphere  is 
a  circular  cylinder  coaxial  with  the  y-axis  of  the  coordinate  frame.  If  the  cylinder  is  painted 
with  two  longitudinal  black  and  white  stripes,  with  white  paint  over  slightly  more  than  the 
region  observed  during  traversal,  then  intensity  could  serve  as  a  feature  which  is  a  monotonic 
transformation  of  the  x -component  of  the  traversal.  Figure  32  illustrates  how  more  white 
becomes  visible  with  increases  in  viewing  position  x-component,  regardless  of  y-component. 


0  degrees  90  degrees  180  degrees 

Figure  32.  A  Monotonically  Increasing  Feature 


Thus  the  trajectory  of  the  intensity  feature  (in  9ft)  is  a  time  series  having  the  same  fractal 
dimension  as  the  Lorenz  attractor. 

Such  simple  objects  are  of  little  practical  interest,  and  features  other  than  intensity  are 
often  used.  Suppose  the  Lorenz  traversal  is  preserved,  but  more  complex  objects  are  viewed 
using  different  features.  In  his  work,  Capt  Fielding  uses  features  consisting  of  vectors  of  28 
dominant  two-dimensional  spatial  Fourier  magnitudes  associated  with  the  views  (12).  That 
is,  28  low  frequency  pairs  are  fixed,  the  spatial  Fourier  transform  is  evaluated  at  each  of  those 
pairs,  and  the  magnitudes  of  those  Fourier  transforms  form  28-dimensional  feature  vectors. 
Thus  each  view  is  condensed  to  a  28-tuple  of  real  numbers.  A  training  sequence  then  consists 
of  a  finite  number  of  such  28-tuples,  and  a  test  sequence  consists  of  a  smaller  number  of 
28-tuples  derived  from  viewing  trajectories  oriented  near  the  viewing  trajectory  used  during 
training. 

Fourier  components  do  not  change  monotonically  with  viewing  position.  Nevertheless, 
they  do  seem  to  change  fairly  continuously  with  viewing  position.  The  five  images  in  Figure  33 
(courtesy  Ken  Fielding)  each  correspond  to  the  flattened  viewing  quadrant.  EacSi  point  within 
each  of  these  images  thus  corresponds  to  a  position  on  the  viewing  quadrant.  The  gray-scale 
intensity  at  each  point  represents  the  value  of  the  fourteenth  Fourier  component  of  the  object 
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at  the  center  of  the  viewing  sphere,  as  viewed  from  that  point.  The  objects  at  the  center 
of  the  sphere  were  models  of  an  M60  tank,  M35  truck,  BTR60  armored  personnel  carrier, 
T62  tank,  and  M2  infantry  fighting  vehicle,  as  labeled.  The  darker  is  the  gray  in  these 
images,  the  higher  is  the  component  value  (with  white  being  zero).  Notice  the  rarity  of  sharp 
discontinuities;  perhaps  this  is  due  to  the  complexity  of  these  vehicles,  i.e.,  the  large  number  of 
parts  with  both  sharp  and  smooth  edges,  no  single  one  of  which  is  dominant.  In  any  case,  the 
distributions  of  fourteenth  component  values  generated  from  the  Lorenz-derived  viewing  path 
appear  rather  continuous.  The  fourteenth  component  values  seemed  typical  in  this  respect  to 
other  components  examined. 

Consider  Figure  34,  in  which  the  viewing  trajectory  P  is  identified.  The  Lorenz  attractor 
is  a  compact  subset  of  (as  is  any  curve  which  is  a  closed  bounded  subset  of  3?^  ( 1 :59)).  Now 
P  is  the  projection  of  the  Lorenz  attractor  confined  to  the  viewing  quadrant,  and  the  mapping 
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Viewing  sphere 


Figure  34.  Embedding  a  Time  Series  Derived  from  Views  along  Trajectory  P 

which  confined  the  Lorenz  attractor  is  a  linear  transformation.  Both  projections  and  linear 
transformations  are  continuous,  so  the  composition  mapping  with  image  P  is  also  continuous. 
Thus  P  is  compact,  since  continuous  images  of  compact  sets  are  compact  (1:82). 

Let  f  denote  the  mapping  of  P  into  the  28-tuples  of  Fourier  vectors  associated  with 
each  viewing  position  in  P.  It  was  argued  above  that  each  of  the  component  functions  of  f 
are  continuous  (at  least,  they  will  be  presumed  continuous);  therefore  f  itself  is  continuous. 
Since  P  is  compact,  the  image  Q  of  P  under  f  is  a  compact  subset  of  3?^*.  Since  all  nonempty 
compact  subsets  of  have  fractal  dimensions,  Q  has  a  fractal  dimension;  call  it  d  (2:183). 

Assume  now  that  the  trajectory  Q  in  3?^®  is  the  solution  trajectory  of  a  system  of 
differential  equations  in  3?^*,  or  the  projection  of  a  solution  trajectory  R  in  a  higher  dimensional 
space  (as  the  two-dimensional  Lorenz  curve  illustrated  in  Figure  6  is  the  projection  of  a  three- 
dimensional  solution  trajectory).  This  requires  that  Q  be  continuous;  but  it  has  been  argued 
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that  the  mapping  f  :  P  Q  is  continuous,  and  since  P  is  continuous,  Q  also  must  be 
continuous  (1:79). 

The  embedding  theorem  can  now  be  applied  to  the  trajectory  Q.  Consider  any  one  of 
the  components  of  Q.  A  delay  coordinate  embedding  of  the  time  series  corresponding  to  that 
component  into  31^^  yields  a  diffeomorphic  image  S  of  Q  (or  of  R,  assuming  now  that  R  has 
fractal  dimension  d),  provided  only  that  k  >  2d. 

Thus,  any  k  consecutive  values  of  any  one  component  of  Q  determine  a  single  point 
on  S.  Consider  now  the  relationship  between  Q  and  S.  For  greater  generality,  suppose  Q  is 
projected  from  a  higher  dimensional  trajectory  R.  The  fundamental  theorem  of  differential 
equations  (18:162)  assures  that  any  point  on  R  determines  all  of  R.  Since  S  and  R  are 
diffeomorphic,  any  single  point  on  S  not  only  determines  all  of  S,  but  its  image  in  R  determines 
all  of  R.  Thus  the  projected  image  of  the  point  on  S  traces  the  trajectory  Q.  Therefore  any  k 
consecutive  views  of  the  object  completely  determine  the  trajectory  Q,  which  will  be  caUed  the 
“object  evolution”  of  the  viewed  object,  to  account  for  its  temporal  as  well  as  spatial  nature. 

Consider  now  the  problem  of  distinguishing  one  object  from  another  by  distinguishing 
one’s  evolution  from  the  other’s;  say,  distinguishing  an  M60  tank’s  evolution  from  that  of  an 
M35  truck.  A  priori,  there  is  no  reason  to  believe  there  is  no  overlap  of  evolutions.  That 
is,  there  may  be  a  point,  or  even  consecutive  points,  shared  by  the  M60  and  M35  evolutions. 
Let  be  the  maximum  of  the  embedding  dimensions  kMeo  and  k^ss  found  sufficient  for 
determining  the  evolutions  of  the  M60  and  M35,  respectively.  Then  provided  there  are  no 
more  than  K  —  I  consecutive  shared  points  on  the  evolutions,  the  M60  and  M35  may  be 
distinguished  simply  ty  matching  a  test  sequence  of  K  28-tuples  (taken  from  one  of  the  two 
evolutions)  to  both  evolutions.  The  evolution  which  matches  identifies  the  object  of  the  test 
sequence.  If  a  maximum  M  of  consecutive  points  on  the  evolutions  are  found  to  overlap, 
where  M  >  K  —  1,  then  M  +  l  points  will  suffice  for  the  test  sequence. 

In  fact,  if  test  sequences  are  only  taken  from  one  or  the  other  known  evolution,  then 
M  +  l  consecutive  test  sequence  points  will  suffice  for  identification,  regardless  of  whether 
M  >  K  —  However,  given  a  test  sequence  of  observations  which  was  obtained  not 
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from  a  partial  traversal  of  P,  but  perhaps  just  near  P,  the  task  of  object  classification  is  more 
problematic.  Intuitively,  the  object  evolution  in  which  most  closely  matches  the  evolution 
determined  by  the  test  sequence,  probably  corresponds  to  the  object  of  the  test  sequence. 
To  maximize  the  probability  of  correct  identification,  however,  at  least  K  consecutive  views 
should  be  included  in  the  test  sequence,  because  K  captures  something  of  the  “dynamical 
essence”  of  the  evolutions.  Consider  Figure  35,  in  which  are  depicted  fanciful  evolutions  for 


Figure  35.  Nonoverlapping  Evolutions  and  a  Test  Evolution 

some  objects  A  and  B,  and  an  evolution  C  corresponding  to  views  of  either  object  A  or  B 
which  were  taken  near  but  not  on  the  viewing  sphere  traversal  common  to  A  and  B.  Since  the 
evolutions  of  A  and  B  are  disjoint,  a  single  view  from  anywhere  on  their  common  viewing 
traversal  will  suffice  to  classify  the  test  object.  However,  one,  or  even  more,  views  taken 
off  the  common  viewing  traversal  may  not  unambiguously  identify  the  object.  If  it  is  found, 
however,  that  K  =  7,  and  seven  or  more  views  are  taken  in  determining  C,  then  a  correct 
classihcation  is  more  likely. 
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4.5  Experimental  Application 


The  Lorenz  traversal  just  described  was  applied  to  obtain  training  sequences  of  views 
of  each  of  five  military  target  classes.  After  the  confined  Lorenz  attractor  was  generated,  the 
(a;,  y)  components  of  its  points  were  projected  onto  the  surface  of  the  sphere,  and  the  resulting 
path  followed  for  2000  points  (repeated  for  each  of  the  five  classes).  At  each  of  the  2000 
views,  Fourier  28-tuples  were  extracted.  Time  series  were  formed  from  the  1st,  14th,  and  28th 
components  of  the  resulting  sequence  of  2000  Fourier  vectors.  Grassberger  and  Procaccia 
analyses  of  each  class  revealed  nearly  linear  log-log  plots  from  which  fractal  dimensions  could 
be  estimated.  See  for  example  Figure  36,  in  which  the  fourteenth  Fourier  component  of  the 
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Figure  36.  G.  and  P.  Plot  for  the  14‘^  Fourier  Component  of  an  M60 


M60  sequence  was  examined.  Based  on  the  slope  of  the  line  joining  the  endpoints  of  this  data, 
the  correlation  dimension  of  the  generating  time  series  is  estimated  at  about  3.96. 

Similar  investigations  were  performed  for  the  1st  and  28th  components  of  the  M60 
data,  and  for  the  1st,  14th,  and  28th  components  of  the  remaining  four  military  vehicles.  The 
maximum  fractal  dimension  so  obtained  (over  all  15  values  computed)  was  5.25;  this  number 
was  used  to  infer  a  minimum  test  sequence  length  of  1 1  >  2  x  5.25. 
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Capt  Fielding,  in  his  work  with  Hidden  Markov  Models,  used  training  sequences  which 
were  not  derived  from  a  Lorenz  viewing  trajectory  (12).  Typically  they  were  derived  from 
roughly  east-to-west  (or  west-to-east)  viewing  trajectories,  and  test  sequences  of  similar 
“horizontal”  orientations  were  used.  E)espite  the  different  training  sequences,  he  found  in  his 
experiments  that  test  sequences  of  length  eleven  were  quite  adequate  for  accurate  classification. 

A  nearest  neighbor  classifier  was  implemented  to  determine  if  test  sequences  of  length  1 1 
were  adequate  for  accurate  vehicle  classification  using  the  Lorenz-derived  training  evolutions. 
Given  a  test  sequence,  this  technique  for  determining  its  class  is  to  match  its  point-by-point 
nearness  to  all  sequences  of  the  same  length  contained  in  aU  the  training  data.  The  class  of 
the  training  sequence  providing  the  best  fit  is  declared  the  most  likely  class.  Consider,  for 
example,  hypothetical  portions  of  five  training  sequences  C'  as  shown  in  Figure  37.  The 
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Figure  37.  Trajectories  in  3?^® 

sampled  test  sequence  in  Figure  37  consists  of  just  seven  points,  and  its  trajectory  through 
phase  space  is  labeled  T.  The  sum  of  the  squared  Euclidean  distances  from  the  seven  points 
Ti  through  T7  to  the  points  Cf  through  C®,  respectively,  is  greater  than  the  sum  of  the  squared 
distances  from  Ti  through  TV  to  the  points  Cj  through  Cj,  respectively.  Indeed,  there  is  no 
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sequence  of  seven  points  in  any  class  which  is  closer  to  the  test  sequence  than  the  seven  points 
C*  through  C7.  The  test  sequence  is  therefore  identified  as  class  four. 

An  equivalent  statement  of  the  classification  procedure  may  be  found  by  imagining  the 
test  sequence  a  single  point  T  in  7  x  28  =  196-space.  That  is,  concatenate  the  28  components 
of  each  of  the  7  points  into  a  single  vector  T  of  dimension  196.  Starting  with  each  point  (prior 
to  the  final  six)  in  all  the  training  classes,  form  196-tuples  from  the  concatenated  components 
of  the  starting  point  and  the  six  points  following  it.  The  class  of  the  point  in  196-space  nearest 
T  in  Euclidean  distance  is  declared  the  class  to  which  T  belongs.  This  is  the  approach  taken 
in  developing  the  nearest  neighbor  algorithm  used  in  this  research. 

Test  sequences  were  obtained  near  the  Lorenz  viewing  trajectory,  but  not  on  it,  by 
solving  the  Lorenz  equations  using  a  different  initial  condition  than  that  used  to  generate  the 
training  trajectory.  Figure  38  shows  the  projected  solution  of  the  Lorenz  equations  when  an 
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Figure  38.  Obtaining  Lorenz  Test  Sequences 


initial  condition  of  (xo  =  — 20,yo  =  15,20  =  15)  was  used.  Ignoring  the  initial  point,  the 
next  20  points  on  this  trajectory  were  used  to  construct  test  sequences  of  lengths  4,  8,  and 
11  (this  allowed  17  sequences  of  length  4,  13  of  length  8,  and  10  of  length  11).  The  20'^* 
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point  is  the  first  point  after  the  knee  of  the  initial  portion  of  the  trajectory;  it  is  located  at 
about  (x  =  —2,  y  =  -4).  Compare  this  trajectory  to  the  trajectory  used  to  derive  the  training 
data.  This  is  illustrated  in  Figure  6,  where  the  initial  condition  used  to  determine  the  trajectory 
was  (xo  =  5,  j/o  =  5,  zq  =  b).  Using  the  test  sequences  with  all  five  vehicles,  classification 
accuracy  was  100%  for  sequences  of  lengths  11  and  8,  and  98%  for  sequences  of  length  4. 
Thus  sequences  of  length  1 1  proved  sufficiently  long  to  classify  these  particular  test  sequences 
perfectly;  but  so  did  sequences  of  length  8,  and  even  sequences  of  length  4  classified  quite 
well. 

There  are  several  possible  explanations  for  these  results.  First,  the  military  vehicles 
may  be  so  different  that  their  training  trajectory-derived  evolutions  in  overlap  very  little, 
or  not  at  all  (for  example,  as  in  Figure  35).  If  this  is  the  case,  very  short  test  sequences 
will  likely  suffice  for  accurate  classification.  Second,  there  could  be  considerable  overlap  of 
evolutions,  but  the  test  sequences  applied  didn’t  fall  near  the  regions  of  overlap.  Third,  the 
embedding  theorem  gives  a  sufficient,  not  necessary,  sequence  length  to  completely  determine 
an  object’s  evolution.  For  example,  eleven  training  sequence  views  of  an  M60  are  sufficient  to 
completely  determine  the  M60’s  evolution  in  3?^*;  a  number  less  than  eleven  may  also  suffice. 

Test  sequences  were  derived  from  Lorenz  solutions  using  a  couple  of  other  initial  con¬ 
ditions,  with  similar  results.  In  both  cases,  using  test  sequences  of  length  eleven  yielded  100% 
classification  accuracy,  but  often  sequences  of  shorter  length  did  the  same.  Perhaps  hundreds 
of  experiments  with  different  test  sequences  could  yield  statistics  which  would  tend  to  confirm 
or  deny  that  test  sequences  of  length  eleven  are  sufficient  for  accurate  classification.  The  test 
sequences  used,  however,  could  not  possibly  be  comprehensive  -  there  are  uncountably  many 
possible  test  sequences  -  so  they  would  have  to  be  chosen  with  some  application  or  geometric 
criterion  in  mind.  It  should  be  considered,  too,  that  “rogue”  test  sequences  can  be  contrived 
such  that  a  short  test  length  classification  will  be  more  accurate  than  ones  of  longer  length. 

Figure  39  illustrates  the  uncertainty  inherent  in  spatio-temporal  classification.  The  curve 
nutrked  Tm^o  represents  a  portion  of  a  training  trajectory  in  3?^*  corresponding  to  a  particular 
viewed  object  M60.  The  points  A,  B,  and  C  represent  points  in  3f?^*  derived  from  consecutive 
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Figure  39.  Portions  of  Trajectories  in 


testing  views  of  the  object  taken  near,  but  not  on,  the  training  trajectory;  so  do  the  points 
D,  E,  F,  and  G,  obtained  during  a  separate  test.  Suppose  that  all  the  points  A  through  G  are 
at  the  same  small  Euclidean  distance  from  points  on  T\f6o>  and  that  none  of  the  other  testing 
points  are  as  close.  How  likely  is  it  that  the  object  viewed  during  the  test  traversal  which  yielded 
the  points  A,  B,  and  C  is  an  M60?  Less  likely,  intuitively,  than  the  object  viewed  during  the 
test  traversal  which  yielded  the  points  D,  E,  F,  and  G,  because  the  latter  testing  trajectory 
tracked  the  training  trajectory  over  a  longer  period  of  time.  Similarly,  greater  confidence  would 
attach  to  five  consecutive  close  points  than  to  four.  The  significance  of  the  number  eleven  is 
that  if  eleven  consecutive  testing  points  coincide  exactly  with  eleven  consecutive  points  on 
and  the  testing  sequence  was  obtained  following  the  same  viewing  quadrant  traversal  as 
the  one  which  yielded  Tm^o,  then  the  testing  trajectory  and  Tmgo  must  be  identical.  For  testing 
trajectories  not  directly  on  Tmgo,  twelve  or  more  consecutive  near  points  may  reasonably  be 
interpreted  as  stronger  evidence  of  an  M60  than  eleven  consecutive  near  points.  Nevertheless, 
lacking  specification  of  allowable  test  sequences,  the  embedology-derived  number  eleven  may 
be  a  reasonable  compromise  test  sequence  length  to  use  in  practice. 
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4. 6  Discussion  of  Viewing  Sphere  Traversal  Strategies 

The  foregoing  use  of  a  Lorenz  viewing  sphere  traversal  is  not  meant  to  imply  advantage 
for  that  strategy  as  opposed  to  any  other  strategy.  Any  traversal  which  gives  feature  vectors 
as  a  continuous  function  of  viewing  position  will  yield  a  feature  space  trajectory  which  has  a 
fractal  dimension  (this  is  because  any  traversal  of  the  viewing  sphere  is  a  compact  subset  of 

As  a  finite-length  continuous  curve  in  feature  space,  however,  its  fractal  dimension  is 
one.  On  the  other  hand,  the  fractal  dimension  of  a  discrete  sampling  of  points  on  the  curve 
may  well  be  greater  than  one.  Consider,  for  example,  a  finite-length  Lorenz  trajectory  in 
As  a  curve  in  3?^,  it  has  a  fractal  dimension  of  one.  However,  a  discrete  sampling  of  points 
on  the  trajectory  will  exhibit  self-similarity  in  space,  and  a  fractal  dimension-determining 
algorithm  will  yield  values  approaching  2.05  for  the  set  of  points  (as  the  length  of  the  Lorenz 
curve  becomes  large). 

An  experiment  conducted  with  another  viewing  sphere  traversal  strategy,  a  sort  of 
random  walk  scan  of  the  viewing  quadrant,  suggests  that  a  fractal  structure  to  the  viewing 
traversal  may  simplify  the  determination  of  fractal  dimensions  of  feature  space  solution 
trajectories.  Perhaps  the  training  evolutions  “inherit”  some  of  the  fractal  nature  of  a  fractal 
viewing  trajectory.  Figure  40  illustrates  a  typical  Grassberger  and  Procaccia  plot  associated 
with  the  H*'*  Fourier  component  of  a  randomly- viewed  M60  tank.  The  relative  lack  of 
collinearity  of  the  data  points  (compare  Figure  36)  weakens  confidence  in  any  fractal  dimension 
derived  from  the  “common”  slope  of  the  line  segments  joining  them  (17). 

On  the  other  hand,  fractal  structure  in  the  viewing  traversal  may  have  the  effect  of 
increasing  the  number  of  points  required  for  determination  of  the  training  trajectory.  Consider 
again  the  intensity  feature  for  the  painted  cylinder  discussed  in  Section  4.4.  If  a  second 
cylinder  is  introduced  for  a  classification  problem,  identical  to  the  first  except  that  the  second 
cylinder’s  white  paint  is  brighter,  then  fractal  dimension-based  classification  using  the  Lorenz 
traversal  will  require  five  views  of  a  test  object  (i.e.,  one  of  the  two  cylinders)  to  make  a 
classification.  This  is  because  the  fractal  dimension  of  the  Lorenz  attractor  is  about  2.05,  and 
five  is  the  smallest  integer  greater  than  twice  this  dimension.  On  the  other  hand,  if  a  simple 
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Figure  40.  G.  and  P.  Plot  for  an  M60  Viewed  along  a  Random  Trajectory 

east-to-west,  west-to-east  raster  traversal  had  been  used  instead  of  the  Lorenz,  then  fractal 
dimension-based  classification  will  require  only  three  views  of  the  test  object.  This  is  because 
the  fractal  dimension  of  the  sinusoidal  intensity  features  of  both  objects  is  one. 

Notice  that  both  methods  of  traversal  can  be  spoofed  by  “unanticipated”  test  sequences. 
The  raster  traversal  method,  for  instance,  could  easily  be  fooled  by  a  north-to-south  test 
sequence.  This  points  out  what  is  perhaps  the  biggest  obstacle  to  constmcting  successful 
spatio-temporal  classifiers:  anticipating  likely  test  sequences  and  accounting  for  them  in 
training  trajectories. 

4. 7  Conclusion 

This  chapter  has  demonstrated  that  the  goal  of  moving  object  recognition  might  be 
facilitated  if  the  motion  (or  apparent  motion)  of  the  objects  is  chosen  carefully  and  appropriate 
features  are  utilized.  A  fractal  dimension  of  object  evolutions  can  be  obtained  and  exploited  if 
the  views  of  the  given  objects  form  a  compact  subset  of  the  viewing  space,  and  the  features  used 
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vary  continuously  with  viewing  position.  The  fractal  dimension  is  exploited  via  the  embedding 
theorem  to  determine  a  test  sequence  length  sufficient  for  complete  object  determination. 

A  nearest  neighbor  spatio-temporal  classifier  was  implemented  to  test  the  theoretically 
determined  sufficient  length  of  test  sequence  against  Fourier  data  obtained  from  models  of 
five  military  objects.  These  objects  were  observed  wnile  following  a  Lorenz-derived  traversal 
of  a  viewing  sphere.  The  results  revealed  that  indeed  the  theoretically  sufficient  test  sequence 
length  was  sufficient;  but  much  shorter  lengths  also  sufficed. 
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V.  Conclusions 


The  Deterministic  Versus  Stochastic  (DVS)  algorithm  developed  by  Martin  Casdagli 
uses  the  local  linear  prediction  technique  described  by  Doyne  Farmer  to  help  decide  whether 
a  given  time  series  represents  low  dimensional  (deterministic)  chaotic  dynamics,  or  is  high 
dimensional  (or  stochastic)  in  origin.  This  algorithm  has  been  modified  here  to  produce 
two  new  prediction  methodologies,  each  of  which  selectively  uses  embedding  space  nearest 
neighbors.  Both  have  been  shown  advantageously  applicable  to  prediction  of  noisy  time  series 
(such  as  experimentally  measured  blood  oxygen  concentration  data,  and  certain  financial  data). 

The  first  algorithm,  pruned  outliers  prediction,  excludes  from  the  DVS-determined 
optimal  number  of  neighbors  nearest  the  prediction  point,  those  of  them  which  lie  farthest 
from  the  linear  prediction  hyperplane.  A  new  hyperplane  is  calculated  based  on  the  remaining 
neighbors,  and  the  value  assumed  on  this  hyperplane  at  the  prediction  point  becomes  the 
predicted  value. 

The  second  algorithm,  overlap  prediction,  endeavors  to  combine  the  benefits  of  using 
one  short  and  one  longer  time  delay.  Prediction  is  based  on  the  shorter  of  the  two  time  delays, 
but  intervals  of  time  are  allowed  to  be  represented  in  the  utilized  set  of  nearest  neighbors  only 
if  they  play  a  role  in  both  short  term  and  long  term  prediction. 

In  each  new  algorithm,  neighbors  which  are  considered  prediction-relevant  are  retained 
for  local  linear  prediction,  while  those  which  are  considered  likely  to  represent  noise  are 
ignored.  These  algorithms  may  in  this  sense  be  considered  to  employ  embedding  space 
filtrations  of  the  time  series.  For  many  time  series,  it  was  found  rather  easy  to  improve  on 
unfiltered  local  linear  prediction  with  one  or  both  of  the  new  algorithms.  For  other  time 
series,  prediction  improvement  was  more  difficult.  It  was  argued  that  prediction  improvement 
difficulty  is  indicative  of  stochastic  data,  independently  of  the  direct  results  of  the  DVS 
algorithm. 

This  research  has  shown  that  the  local  linear  technique  provides  perfect  predictions  in 
cases  where  phase  space  trajectories  are  concentric  circles,  and  where  they  arc  locally  parallel. 
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This  helps  to  explain  why  attempts  to  use  higher-order  polynomial  regression  prediction 
sometimes  provide  little  benefit  over  linear  regression  prediction  (8:846). 

Another  new  result  gives  an  invariance  condition  for  the  fractal  dimension  of  a  time 
series.  Specifically,  a  theorem  is  proven  which  shows  that  the  fractal  dimension  of  a  time 
series  does  not  change  if  a  differentiable  mapping  which  is  strictly  monotonic  on  the  series’ 
range  of  values  is  applied  to  it. 

It  has  also  been  shown  that  the  problem  of  moving  object  classification  may  be  analyzed 
in  terms  of  embedded  time  series,  in  the  sense  that  embedology  can  supply  a  reasonable  length 
of  test  sequence  for  accurate  classification.  Sequentially  recorded  feature  vectors  of  a  moving 
object  form  a  training  trajectory  in  feature  space.  Each  of  the  sequences  of  feature  vector 
components  is  a  time  series,  and  under  certain  conditions,  each  of  these  time  series  will  have 
approximately  the  same  fractal  dimension.  The  embedding  theorem  may  be  applied  to  this 
fractal  dimension  to  establish  a  number  of  observations  sufficient  to  determine  the  feature 
space  trajectory  of  the  object.  It  was  argued  that  this  number  is  a  reasonable  test  sequence 
length  for  use  in  object  classification.  Experiments  with  data  corresponding  to  live  military 
vehicles  (observed  following  a  projected  Lorenz  trajectory  on  a  viewing  sphere)  showed  that 
this  length  was  indeed  adequate. 
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Appendix  A.  Fractal  Dynamics 

In  his  book  Fractals  Everywhere,  Michael  Barnsley  provides  a  precise  definition  of 
chaos  and  a  proof  that  certain  dynamical  systems  are  chaotic.  He  also  shows  how  calcu¬ 
lated  trajectories  can  maintain  their  fractal  quality  even  in  the  presence  of  computational 
errors  (2),  His  development  is  based  on  the  theory  of  iterated  function  systems,  which  is 
briefly  summarized  in  this  appendix.  Although  Barnsley’s  book  is  remarkably  self-sufficient, 
a  good  first  course  in  mathematical  analysis  is  helpful.  Lang’s  book  Analysis  I  is  especially 
recommended  (19). 

For  brevity,  this  Appendix  omits  certain  definitions  and  proofs  when  it  is  felt  that 
these  omissions  will  not  hinder  a  broad  understanding  of  discrete  chaotic  dynamical  systems. 
Barnsley’s  book  is  the  appropriate  source  for  all  details. 

A  contraction  mapping  is  a  function  /  :  X  -♦  X  on  a  metric  space  (X,  d)  (that  is,  a 
set  X  with  distance  metric  d  :  X  x  X  -+  3?)  for  which  there  is  a  constant  0  <  s  <  1  (called 
a  contractivity  factor  for  /)  such  that  for  all  x  and  y  in  X, 

d{f{x),f{y))  <s-d{x,y) 

A  hyperbolic  iterated  function  system  (IFS)  consists  of  a  complete  metric  space  X  together 
with  a  finite  set  of  contraction  mappings  Wn  :  X  X,  with  respective  contractivity  factors 
s„,  for  n  =  1,2, The  notation  for  this  IFS  is  {X-,Wn,n  =  1,2, ...,A^}.  Its 
contractivity  factor  is  s  =  Max{sn  :  n  =  1,2,...,  N).  The  word  hyperbolic  means  that  each 
contractivity  factor  satisfies  0  <  <  1;  unless  the  context  clearly  indicates  otherwise,  an 

IFS  is  understood  to  be  hyperbolic. 

The  closed  unit  interval  [0, 1]  in  3?,  with  the  distance  between  two  points  being  the 
absolute  value  of  their  difference,  is  an  example  of  a  complete  metric  space.  The  function 
wi(x)  =  x/3  is  an  example  of  a  contraction  mapping  on  X,  with  contractivity  factor  1  /3;  so 
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is  the  function  W2{x)  =  (x  +  2)/3.  Thus  {[0, 1];  u>i,  ^2}  is  an  IFS,  with  contractivity  factor 
1/3. 

Let  {X]Wn,  n  =  1,2,  ...,N}  be  an  IFS  with  contractivity  factor  s  on  the  complete 
metric  space  There  is  a  derived  metric  space  'H{X)  whose  elements  are  nonempty 

compact  subsets  of  X  and  whose  metric  is  the  Hausdorff  distance  (denoted  h{d)  or  sometimes 
simply  h)  between  such  subsets.  With  this  metric,  H{X)  is  also  complete,  so  the  Contraction 
Moping  Theorem  applies  to  it.  The  upshot  is  that  the  transformation  W  :  'H{X)  -+  HiX) 
defined  by 

WiB)  =  U  w„(B) 

n=l 

for  all  B  €  H{X)  is  a  contraction  mapping  on  the  complete  metric  space  i'H{X),  h{d))  with 
contractivity  factor  s.  That  is. 


h{W{B),W{C))<s-h{B,C) 
for  all  B,  C  €  H{X).  Its  unique  fixed  point,  A  €  H{X),  obeys 

A  =  W{A)  =  (J  wM) 

n=l 

and  is  given  by  A  =  lim„_oo  W°^{B)  for  any  B  G  HiX),  where  W°^{B)  denotes  n 
iterations  of  the  mapping  W. 

The  fixed  point  A  G  'H{X)  is  called  the  attractor  of  the  IFS. 

Recall  the  classical  Cantor  set  C  created  by  successive  deletions  of  middle  thirds  of 
closed  intervals,  beginning  with  the  closed  unit  interval.  The  set  C  is  the  attractor  of  the  IFS 
{[0, 1];  u;i(x)  =  ix,  W2{x)  =  5x4  §}  just  described.  This  IFS  is  totally  disconnected-,  since 
its  maps  wi  and  W2  are  injective,  this  means  that  C  is  the  disjoint  union  of  the  images  of  itself 
under  wi  and  W2.  That  is,  C  =  W{C)  =  wi{C)  U  W2{C)  disjointly.  Equivalently  (in  the  case 
of  subsets  of  3ft),  C  contains  no  intervals  (6:37). 
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Notice  that  foragenerallFS,  an  infinite  composition  of  mappings  •  •  (y) 

specifies  a  unique  point  on  the  attractor  of  the  IFS,  where  the  a,-  are  taken  from  the  set 
{1,2,...,  N}  of  indices  of  the  contraction  mappings  and  where  y  is  any  nonempty  compact 
subset  of  X.  For  example,  the  point  1/3  in  C  can  be  written  wi  o  W2  o  W2  o  W2  o  •  •  •  (0). 
This  fact  allows  a  natural  correspondence  between  the  attractor  A  of  the  IFS  and  a  complete, 
compact  metric  space  called  code  space  whose  elements  are  infinite  sequences  of  symbols 
taken  from  {1,2,...,  N}.  Code  space  is  denoted  (S,  dc)  (or  sometimes  simply  E)  where  dc 
is  a  metric  defined  by 


dc{u),a) 


for  alia;,  a  €  S. 


The  natural  correspondence  alluded  to  is  in  fact  a  continuous  transformation  ;  S  — >  A 
given  by  <i>{a)  =  o  o  o  •  •  •  (y)  where  a  =  (Ti<T2cr3  •  •  •  and  y  is  any  element  of 
HiX).  For  the  Cantor  set  example,  <^(1222-  •  •)  =  1/3.  The  transformation  <f>  is  always 
surjective  but  it  is  not  necessarily  injective.  The  IFS  is  totally  disconnected  iff  <i>  is  injective. 

There  is  another  metric  on  E  which  allows  one  to  picture  code  space  as  a  subset  of  the 
unit  interval.  In  particular,  let  (f2  :  S  x  E  -♦  3?  be  defined  by 


<^2(w,  a) 


Then  d2  is  a  metric  on  E,  and  (E,  dc)  and  (E,  d2)  are  equivalent  metric  spaces.  This  means  that 
points  that  are  close  in  (E,  dc)  are  also  close  in  (E,  d2),  and  that  there  is  no  infinite  stretching 
or  compression  when  going  from  (E,  dc)  to  (E,  di)  (or  from  (E,  di)  to  (E,  dc)).  The  d2  metric 
on  E  is  precisely  the  absolute  value  of  the  difference  between  points  in  the  unit  interval  whose 
{N  +  l)-ary  fractional  expansions  contain  no  zeroes.  For  example,  with  =  2, 1/2  can  be 
represented  in  the  unit  interval  as  .1 1 1  •  •  •  where  the  leading  period  denotes  the  ternary  point; 
11/18  can  be  represented  .12111  •  •  sand  |.lll - .12111  •  •  •  |  =  1/9. 

The  concept  of  dynamical  system  can  now  be  defined,  and  subsequently  the  notion  of 
a  chaotic  dynamical  system. 
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A  dynamical  system  {X]  /}  is  a  mapping  f  :  X  -*  X  on  &  metric  space  X.  The  orbit 
of  a  point  x  €  A"  is  the  sequence 

Examptes  of  dynamical  system  abound.  Two  interesting  examples  of  dynamical  systems 
on  the  unit  interval  [0, 1]  are  given  in  Figure  41. 


fi[x)»Min(2x.2-2x)  g(x)  >  2x  (mod  1) 


Figure  41.  Example  Dynamical  Systems  on  the  Unit  Interval 


Iterated  function  systems  admit  especially  interesting  types  of  dynamical  systems.  Let 
{X',Wn^n  =  1 , 2, . . . ,  JV}  be  a  totally  disconnected  IFS  with  attractor  A.  Then  A  is  the  union 
of  the  to„(A),  where  for  all  i  ^  j,  Wi{A)  n  iWj(A)  =  0.  The  associated  shift  transformation 
on  A  is  the  transformation  S  :  A~*  A  defined  by 

5(a)  =  w"^(a) 

for  a  e  u;„(A),  where  u;„  is  viewed  as  a  transformation  from  A  onto  iy„(  A).  The  dynamical 
system  (A;  5}  is  called  the  shift  dynamical  system  associated  with  the  IFS. 

Using  the  shift  transformation  of  a  totally  disconnected  IFS,  it  is  straightforward  to  trace 
the  orbit  of  a  given  point  of  A.  A  similar  procedure  can  be  used  to  trace  orbits  when  the  IFS  is 
not  totally  disconnected,  i.e.,  when  the  code  space  mapping  :  S  -+  A  is  not  injective.  For 
clarity,  attention  is  here  restricted  to  IFS’s  with  only  two  contraction  maps.  Let  {A;  wi,  u;2} 
be  an  IFS  with  attractor  A.  Assume  that  both  wi  and  W2  are  injective.  A  sequence  of  points 
{x„}^o  ^  called  an  orbit  of  the  random  shift  dynamical  system  ass(x;iated  with  the  IFS 
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if  for  each  n  e  {0, 1, 2, . . 


®n+l 


f*^r^(^n)  when  Xn  €  «?i(A),  and  Xn  wi(A)  D  W2{A), 
^(®n)  when  x„  €  t(;2(A),  and  Xn  ^  lui(A)  n  tw2(A), 
one  of  {u>i'^(x„),u>J*(xn)},  when  x„  G  i«i(A)  D  u;2(A) 


The  notation  x„4.i  =  5(x„)  is  adopted,  although  there  may  be  no  well-defined  transformation 
S  :  A  A  which  makes  this  true.  The  pair  {A;  5}  is  called  the  random  shift  dynamical 
system  associated  with  the  IFS. 

An  example  of  an  orbit  from  a  random  shift  dynamical  system  is  provided  in  Figure  42. 
The  applicable  (overlapping)  IFS  is  {[0,  l];i£)i(x)  =  |x,iy2(x)  =  |x  -t- with  respective 


Figure  42.  An  Example  Orbit  (2:154) 
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inverse  maps  2x  and  52^  —  5.  Notice  that  in  the  overlapping  region,  the  choice  of  inverse  map 
or  iwj *  is  random. 

A  deterministic  dynamical  system  can  be  constructed  which  acts  on  a  higher  dimensional 
space  and  whose  projection  into  the  original  sp^  X  yields  the  random  shift  dynamics  just 
described.  It  turns  out  that  the  space  A*  x  S  is  a  complete  metric  space,  with  metric  defined 
as  the  maximum  of  the  pair  of  metrics  and  dc  acting  onX  x  X  and  S  x  E  respectively. 
The IFS  associated  with  the  IFS  {X  :  wi,w2}  isthelFS  {X  x  E-,wi,W2}  whereEis 
the  code  space  on  two  symbols  {1,2}  and 

wi{x,<t)  =  (u;i(x),  Icr)  for  all  (x,<t)  €  A"  x  E 
tZj2(x,<T)  =  {w2{x),2(t)  for  all  (x,  O’)  6  A  x  E 

The  attractor  A  of  the  lifted  IFS  is  the  graph  of  the  code  space  map  (i>\  that  is,  A  = 
{(<^(<t),(7)  :  €  E).  Its  projection  into  the  original  space  A  is  simply  the  attractor  A  of 

the  original  IFS.  Furthermore,  A  is  totally  disconnected  in  the  sense  that  the  projection  map 
from  A  into  E  is  one-to-one  on  E.  Figure  43  shows  a  lifting  of  the  the  attractor  A  of  the 
IFS  {[0, 1];  wi{x)  —  |x,  iW2(x)  =  |x  -t-  |}.  If  the  maps  wi  and  1x2  in  the  original  IFS  are 
injective,  then  the  lifted  IFS  is  totally  disconnected.  In  this  case,  the  shift  dynamical  system 
{A,  5}  associated  with  the  lifted  IFS  is  called  the  lifted  shift  dynamical  system  associated  with 
the  IFS.  The  action  of  5  on  A  is  given  by 

S{x,(t)  =  (u;Jj‘(x),  <72(73(74  •••) 


for  all  (x,  a)  =  (x,  (T\G2<T3  •  •  •)  in  A. 

The  following  “Shadow  Theorem”  reveals  that  the  orbits  of  a  large  class  of  (possibly 
overlapping)  IFS’s  can  be  viewed  as  projections  of  orbits  of  totally  disconnected  IFS’s. 

The  Shadow  Theorem:  Let  {A  :  u;i,u;2}  be  an  IFS  with  injective  transformations  wi 
and  W2  and  attractor  A.  Let  [xn}^o  of  fhe  associated  random  shift  dynamical 
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Figure  43.  A  Lifted  Attractor  (derived  from  (2:158)) 

system  {A;  S}.  Then  there  is  an  orbit  {5n}^o  the  lifted  shift  dynamical  system  {A;  §} 
such  that  the  first  component  of  Xn  is  x„  for  all  n. 

Figure  43  also  illustrates  the  shadow  theorem. 

Even  if  one  were  fortunate  enough  to  be  able  to  identify  a  point  xq  on  an  attractor  with 
perfect  precision  (ie,  to  know  perfectly  <t  €  S  such  that  <^(<7)  =  xo),  he  or  she  probably 
wouldn’t  be  able  to  calculate  its  oibit  exactly  at  each  step.  That  is,  one  wouldn’t  be  able  to 
calculate  exactly  the  points  ii  =  S{xq),  xj  =  5(xi), . . ..  Fortunately,  there  is  a  “Shadowing 
Theorem’’  which  reveals  that,  regardless  of  how  many  small  errors  are  made,  there  is  an  exact 
oibit  which  lies  at  every  step  within  a  small  distance  of  the  errorful  one. 

The  Shadowing  Theorem:  Let  {X;  loi ,  ti;2>  •  •  • ,  }  be  an  IFS  of  contractivity  s,  where 

0  <  s  <  1.  Let  A  denote  the  attractor  of  the  IFS  and  suppose  that  each  of  the  transformations 
io„  :  A  -♦  A  is  injective.  Let  {A;  S}  denote  the  associated  shift  dynamical  system  in  the 
case  that  the  IFS  is  totally  disconnected;  otherwise  let  {A;  5}  denote  the  associated  random 
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shift  dynamical  system.  Let  {xn}S^o  C  A  be  an  approximate  orbit  of  S,  such  that 

<i(xn+i,‘S'(x„))  <6  for  all  n  =  0, 1,2,3, .. . 

for  some  fixed  constant  0  with  0  <  d  <  diam(A).  Then  there  is  an  exact  orbit  {x„  = 
5®”(xo)}^o  for  some  xo  €  A,  such  that 

30 

d(xn+i , ^n+i )  —  ^  ~  0,1,2,3,... 

(1  -s) 

Figure  44  shows  an  approximate  orbit  of  the  point  xq  on  an  attractor  A  (a  Sierpinski 


Tkt  Skadawing  TVnmn 
ItSt  m  then  it  on  exact 
erbil  whidt  It  dour  ie 
(i.}  Uum  0.01  for  all  a. 


Figure  44.  The  Effect  of  Computational  Errors  (2:163) 

triangle).  The  shadowing  theorem  assures  that  there  is  a  point  xo  on  A  whose  exact  orbit 
remains  less  than  or  equal  to  sdl{\  -  s)  at  each  pair  of  points  on  the  orbits. 

Most  of  the  concepts  necessary  to  define  the  phrase  “chaotic  dynamical  system”  have 
now  been  presented.  Only  a  few  more  definitions  are  needed. 


Let  (A",  (f)  be  a  metric  space.  A  subset  .0  C  A  is  said  to  be  dense  in  A  if  the  closure 
of  B  equals  A.  A  sequence  {a:n}^o  of  points  in  A  is  said  to  be  dense  in  A  if,  for  each 
point  a  €  A,  there  is  a  subsequence  which  converges  to  a.  In  particular,  an  orbit 

{x„}^o  of  a  dynamical  system  {A;  /}  is  said  to  be  dense  in  A  if  the  sequence  {a^n}^o  is 
dense  in  A. 

A  dynamical  system  {A;  /}  is  transitive  if,  whenever  if  and  V  are  open  subsets  of  the 
metric  space  (A,  d),  there  exists  a  finite  integer  n  such  that 

U  n  /°“(V)  ^  0 

The  dynamical  system  {[0, 1];  f{x)  =Min{2x,  2  —  2x}}  depicted  in  Figure  41  is  transitive. 

The  dynamical  system  {A;  /}  is  sensitive  to  initial  conditions  if  there  exists  (5  >  0 
such  that,  for  any  x  €  A  and  any  closed  ball  0(x,  c)  with  radius  e  >  0  there  is  y  €  B{x,  c) 
and  an  integer  n  >  0  such  that  <f(/®”(x),  /°”(y))  >  S. 

This  means  roughly  that  orbits  which  begin  close  together  get  pushed  apart  by  the 
action  of  the  dynamical  system.  The  dynamical  system  {[0, 1];  ^(x)  =  2x(modl)}  depicted 
in  Figure  41  is  sensitive  to  initial  conditions. 

A  dynamical  system  { A;  /}  is  now  defined  chaotic  if  it  is  transitive,  sensitive  to  initial 
conditions,  and  the  set  of  periodic  orbits  of  /  is  dense  in  A. 

The  following  theorem  provides  a  wealth  of  chaotic  dynamical  systems. 

Theorem;  The  shift  dynamical  system  associated  with  a  totally  disconnected  IFS  of  two 
or  more  transformations  is  chaotic. 

In  particular,  the  shift  dynamical  system  associated  with  the  classical  Cantor  set  C  is 
chaotic.  This  means  that  if  the  points  on  C  are  “backed  out”  using  the  inverses  of  the  maps 
which  provided  convergence  to  them,  the  resulting  set  of  orbits  is  chaotic. 
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Appendix  B.  Fractal  Interpolation  Functions 

Consider  the  set  of  points 

{(xo,  Fo),  (xi,  Fi), {xpi,  Fn)} 

in  where  xo  <  xi  <  ...  <  xs-  There  are  a  number  of  ways  these  points  can  be 
interpolated  with  continuous  functions  on  the  interval  [xo,xa^];  the  most  familiar  of  these 
interpolate  using  differentiable  functions.  There  is,  however,  a  family  of  functions  which  are 
not  necessarily  differentiable  and  which  also  interpolate  these  points.  These  functions  are 
derived  from  mappings  called  shear  transformations  (2:214).  Consider  for  example  the  three 
points  on  the  left  of  Figure  45.  An  interpolating  function  is  sought  which  joins  these  points 


Figure  45.  A  Parabolic  Interpolation  of  Three  Points 

(0,0),  (0.5, 0.5),  and  (1,0).  One  such  function,  a  (differentiable)  quadratic  polynomial,  is 
illustrated  on  the  right  of  Figure  45.  It  is  also  possible  to  produce  curves  in  3?^  which  are  the 
attractors  of  iterated  function  systems  (IFS’s;  see  Appendix  A)  and  which,  viewed  as  real¬ 
valued  functions  on  [xo,x/v],  interpolate  the  points  (0,0),  (0.5, 0.5),  and  (1,0).  The  linear 
splines  interpolation  illustrated  on  the  left  of  Figure  46  is  the  attractor  of  the  IFS  consisting  of 
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0  J_  1  0  2  1 

2  2 

Figure  46.  Two  Fractal  Interpolation  Functions  (2:223) 


the  shear  transformations 


The  more  jagged  interpolation  function  shown  on  the  right  of  Figure  46  is  the  attractor 
of  the  iterated  function  system 


Although  neither  of  the  functions  shown  in  Figure  46  are  differentiable  at  every  point  of 
the  interval  [0, 1],  iterated  function  systems  can  yield  attractors  whose  graphs  are  everywhere 
differentiable  functions.  For  example,  the  parabolic  curve  shown  on  the  right  of  Figure  45  is 
the  attractor  of  the  IFS 


Notice  that  the  IFS’s  represented  in  systems  (6),  (7),  and  (8)  differ  only  in  the  lower 
right  entries  of  the  2  x  2  matrices.  It  turns  out  that  as  long  as  all  these  entries  (called 
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vertical  scaling  factors)  are  less  than  one  in  magnitude,  there  is  a  metric  on  with  respect 
to  which  the  mappings  fi,f2,9i,92^hu  and  /12  are  contraction  mappings  (2:214).  Hence 
each  of  the  systems  (6),  (7),  and  (8)  truly  are  IFS’s,  so  for  each  there  really  is  an  attractor 
in  3?^.  The  contraction  mappings  exemplified  in  systems  (6),  (7),  and  (8)  are  called  shear 
transformations;  because  of  the  upper  right  O’s  in  all  the  2  x  2  matrices,  these  mappings  take 
lines  parallel  to  the  y-axis  to  lines  which  are  also  parallel  to  the  y-axis  (2:214).  Consider 
the  attractor  A  depicted  in  Figure  46.  Given  any  nonempty  compact  subset  of  infinite 
iteration  of  it  by  the  equations  of  system  (7)  (as  described  in  Appendix  A)  yields  the  nonempty 
compact  attractor  A.  Since  A  is  an  attractor,  A  =  9i{A)  U  92{A).  In  this  example,  the  first 
half  of  A  is  a  contracted  counterclockwise  rotation  of  all  of  A,  and  the  second  half  of  A  is 
obtained  by  flipping  A  vertically,  ro^^iting  it  clockwise,  contracting  it,  and  translating  it  to  the 
second  half  of  the  unit  interval. 

More  generally,  any  function  F  defined  on  an  interval  [xcXa/]  with  known  values 
Fq,Fi,...,Fn  at  the  points  xo  <  xi  <  . . .  <  can  be  interpolated  by  the  attractor 
of  an  IFS  consisting  of  shear  transformations;  the  graph  of  such  an  attractor  is  called  a 
fractal  interpolation  function  (2:220).  It  is  often  possible  to  obtain  a  close  approximation 
of  the  underlying  function  using  far  fewer  than  N  shear  transformations,  and  the  savings  in 
representational  data  required  can  be  considerable.  Consider  how  succinct  is  the  representation 
(6)  of  the  function  A  depicted  on  the  right  of  Figure  46  compared,  for  example,  to  its 
representation  as  a  sum  of  sines  and  cosines.  Even  if  A  had  been  sampled  at  thousands  of 
points,  no  closer  fractal  approximation  could  be  obtained  than  by  ignoring  all  but  the  first, 
middle,  and  last  values.  This  is  because  of  an  inherent  self-similarity  within  A  -  portions  of 
A  are  merely  affine  transformations  (albeit  sometimes  hard  to  see)  of  all  of  A. 

As  will  be  shown,  strange  attractors  can  variously  be  considered  real-valued  functions 
on  a  subset  of  a  vector  space,  or  vector  valued  functions  on  a  real  interval.  It  was  felt  that 
their  chaotic  natures  might  allow  them  to  be  represented  with  fewer  shear  transformations  than 
embedded  data  from  a  random  sequence  time  series.  Experimentation,  however,  provided  no 
strong  evidence  of  such  a  relationship. 
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A  strange  attractor,  such  as  the  Lorenz  or  Henon  attractor,  represents  the  steady  state, 
nonperiodic  solution  of  a  differential  (or  difference)  equation  and  as  such  never  intersects 
itself  (18:168).  Suppose  an  attractor  B  is  the  solution  of  an  equation  in  n  variables,  so  that 
its  “native”  space  is  3?".  The  nonintersecting  characteristic  of  B  implies  that  if  points  on  it 
are  represented  in  by  (n  +  l)-tuples  consisting  of  the  points  and  the  times  at  which 
they  are  generated,  the  resulting  set  of  points  define  a  real- valued  function  on  Consider 
for  example  the  two-dimensional  Henon  attractor  depicted  in  Figure  7.  It  is  generated  by 
successive  application  of  the  equations 

x(i-l-l)  =  1 -I- 7/(i)  —  1.4x^(i) 
y{i  -1-1)  =  0.3x(i) 

from  some  given  initial  condition  (any  initial  condition  will  result  in  the  same  general 
form  (31:73)).  Suppose  that  at  each  step  in  the  generation  of  the  Henon  attractor,  a  time 
is  impressed,  so  that  instead  of  ordered  pairs  being  generated,  ordered  triples  are  generated, 
according  to  the  rules 


-f  1)  =  i  -l- 1 

x(i -|- 1)  =  1 -H  y(i)  —  1.4x^(i) 
y{i  -1-1)  =  0.3x(i) 

The  resulting  set  of  points  may  be  considered  a  real-valued  function  of  the  pairs  (xj+i ,  j/,+i ), 
or  a  vector-valued  function  of  time.  The  first  25  of  them  are  depicted  in  Figure  47.  If  the 
points  were  connected  sequentially,  the  resulting  graph  would  be  a  curve  in  3?^. 

It  is  possible  to  approximate  this  curve  as  an  attractor  of  an  IFS  in  much  the  same 
way  that  the  parabolic  curve  of  Figure  45  can  be  approximated  by  the  attractors  depicted  in 
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Figure  47.  Time-Stamped  Henon  Attractor 


Figure  46  (22).  In  general  suppose  a  set  of  iV  +  1  points 
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are  available  in  with  to  <  ti  <  •  • '  <  tpf.  These  might  be,  for  example,  the  25  points 
depicted  in  Figure  47.  For  any  two  time  indices  t,,  <  it  is  possible  to  define  a  contraction 
mapping  ;  S*  — »  3?*  by 
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which  takes  all  points  with  time  indices  between  to  and  to  points  with  time  indices  between 
and  Ij.  The  2k  parameters  an  through  a*i  and  bi  through  6*  may  be  determined  by  the 
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2k  linear  equations  in  2k  unknowns  resulting  from  the  endpoint  conditions 
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Generalizing  some  work  by  Mazel  and  Hayes  (23),  it  turns  out  that  each  of  the  remaining 
k  —  1  scaling  factors  ajj,  j  =  2, 3, . . . ,  fc,  can  be  approximated  as  the  ratio  of  the  maximum 
j/j-i -deviation  of  the  curve  in  the  interval  to  the  maximum  -deviation  in  the 

interval  [to,  f  a^],  provided  this  ratio  is  less  than  one.  (This  ratio  can  always  be  made  less  than 
one  by  taking  t,,  close  enough  to  .)  Consider  for  example  Figure  48,  which  shows  how 
the  scaling  factor  ajj  can  be  determined  for  a  shear  transformation  with  endpoint  conditions 


Figure  48.  Determining  Fractal  Interpolation  Scaling  Factors 
at  tyv-e  and  In.  First  the  iV  -f  1  points  given  in  (9)  are  projected  onto  the  plane. 
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Then  the  deviation  d  is  determined  as  the  maximum  vertical  distance  from  the  line  joining 
the  endpoints  at  to  and  t/v  to  the  set  of  projected  points.  Then  the  deviation  di  is  determined 
as  the  maximum  vertical  distance  from  the  line  joining  the  endpoints  at  In-s  and  tN-  The 
sign  of  a  deviation  is  positive  if  the  point  of  maximizing  distance  lies  above  the  line  joining 
the  endpoints  (so  that  d,  is  positive  in  Figure  48).  Conversely,  the  sign  of  a  deviation  is 
negative  if  the  point  of  maximizing  distance  lies  below  the  line  joining  the  endpoints  (so  that 
d  is  negative  in  Figure  48).  The  scaling  factor  Ojj  =  di/d.  Notice  that  if  the  interval  being 
considered  had  been  ^a^],  then  there  would  be  zero  vertical  deviation  in  this  interval, 
whence  ajj  =  0.  Indeed,  if  the  entire  interval  from  h  to  tN  is  partitioned  point-by-point  into 
iubintervals  [ti,  U+i],  then  for  every  subinterval,  ajj  =  0  for  each  j  =  1, 2, . . . ,  fc  -  1.  In 
this  case  the  fractal  interpolation  of  the  set  of  points  (9)  is  piecewise  linear  in  and,  with 
N  shear  transformations,  the  number  (3/:  —  1)  x  AT  of  parameters  needed  to  specify  the  set 
of  points  is  larger  than  the  number  (A^  -I- 1)  x  of  components  required  to  specify  the  points 
directly. 

A  fractal  interpolation  function  is  the  attractor  of  a  set  S  of  shear  transformations  with 
endpoint  conditions  which  partition  the  interval  [<o,iiv]-  It  may  do  either  a  good  or  poor 
job  approximating  the  given  set  of  points  (9).  A  closely  approximating  fractal  interpolation 
function  can  usually  be  fitted  recursively  to  the  points  (9)  using  a  higher-dimensional  version 
of  an  approach  described  by  Mazel  (23).  From  the  final  point  at  tN,  a  search  is  performed 
backwards  through  all  previous  points  at  times  calculating  at  each  the  parameters  of 
the  shear  transformation  Wi  determined  by  the  interval  [ti,tN]  and  recording  the  maximum 
separation  of  the  original  set  of  points  (9)  from  their  images  in  [t,,  t  n]  .  The  endpoint  t,  t  v-i 
which  results  in  the  lowest  computed  error  is  selected  as  the  new  right  endpoint  (replacing 
tN)  and  its  corresponding  shear  transformation  iw,  becomes  one  member  of  the  set  S.  This 
process  is  repeated  on  the  left  subinterval  [to,  t,]  of  [to,  tA/],  isolating  the  interval  [tj,  t,]  and 
associated  shear  transformation  Wj  which  results  in  the  least  separation  of  the  given  set  of 
points  from  their  images  under  Wj  in  [tj,  t^].  Eventually  the  left  subinterval  is  selected  in  its 
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entirety,  or  it  becomes  [to,  ^2];  in  either  case,  the  set  S  of  shear  transformations  is  completely 
determined. 

Barnsley’s  collage  theorem  (2:96)  can  be  used  to  obtain  a  bound  on  the  error  associated 
with  interpolating  the  set  of  points  (9)  using  the  set  S  of  shear  transformations.  The  error 
bound  was  not  computed  in  this  research.  Rather,  the  principle  focus  was  on  comparing  the 
sets  S  obtained  for  pseudo-random  time  series  with  the  sets  S  obtained  for  chaotic  time  series, 
to  see  if  the  two  types  of  time  series  could  be  distinguished  by  the  gross  properties  of  these 
sets. 

A  number  of  approaches  were  tried;  none  were  found  particularly  revealing.  In  one 
experiment,  the  lengths  of  best  end  intervals  were  determined  for  each  of  the  last  82  points 
in  the  embeddings  of  various  time  series,  each  of  length  1082.  (One  of  these  was  a  pseudo¬ 
random  time  series  generated  using  Press’s  routine  “ran3”  (29:283).)  For  each  embedded  time 
series,  the  best-  fitting  end  interval  of  maximum  length  and  the  average  of  the  82  end  interval 
lengths  were  recorded.  In  addition  to  the  pseudo-random  time  series,  five  chaotic  time  series 
were  examined:  Henon  and  Lorenz  data,  and  Glass-Mackey  data  with  fractal  dimensions  of 
5.0, 5.5,  and  6.0  (15:74).  The  average  end  interval  lengths  ranged  from  about  11  to  about  39, 
with  the  pseudo-random  data  third  lowest  at  about  25. 

It  might  be  expected  that  a  random  sequence  would  have  a  shorter  average  end  interval 
length  than  that  of  chaotic  data;  after  all,  there  should  be  no  self-similarity  within  random  data 
to  allow  compression  of  the  entire  data  set  into  large  end  intervals.  Perhaps  this  result  was  not 
observed  because  the  “random”  data  set  used  was  not  truly  random  but  merely  pseudo-random. 
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Appendix  C.  C  Program  Source  Listings 

C.l  Description 

The  bodies  of  the  programs  used  are  listed  in  the  next  section.  Subroutines  called  (many 
from  Press’s  book  Numerical  Recipes  in  C  (28))  are  listed  in  the  last  section. 

The  parameters  in  the  first  program,  fdfinder4.c,  are  set  at  values  which  seem  to  reveal 
well  the  fractal  dimension  of  the  Henon  attractor.  Applying  progressively  larger  values  of 
ymax,  up  to  about  4000  (or  higher,  if  one  is  willing  to  tolerate  lengthy  computer  run  times), 
an  apparent  convergence  to  the  published  value  1.25  (17:193)  can  be  seen.  An  input  file 
consisting  of  Henon  map  y-components  (accurate  to  five  decimal  places)  should  be  used,  with 
the  Henon  map  parameters  set  to  a  =  1.4,  6  =  0.3  (35:312).  The  first  three  values  of  the 
y-component  time  series  are  0.00000,  0.30000,  and  -0.12000. 

The  remaining  five  programs  all  have  parameters  set  for  processing  the  sleep  apnea  data 
set  described  in  Section  3.7.  The  program  casdagli7.c  generates  all  DVS  data;  it  also  imple¬ 
ments  DVS  prediction  when  the  parameter  kbest  is  set  to  a  predetermined  optimal  number 
of  nearest  neighbors.  The  program  casdaglil3.c  implements  pruned  outliers  prediction.  Pro¬ 
grams  casdaglil7.c,  casdaglilS.c,  and  casdaglil9.c  are  applied  sequentially  to  implement  the 
overlap  prediction  algorithm;  casdaglil7.c  must  be  run  twice  (using  different  output  file  names 
with  each  run)  to  produce  lists  of  candidate  interval  endpoints.  The  output  file  goodngbrs 
produced  by  program  casdaglilS.c  contains  the  number  of  retained  intervals  as  its  first  entry. 
This  value  must  be  removed  from  goodngbrs  and  provided  to  program  casdaglil9.c  prior  to 
execution  of  casdaglil9.c,  which  produces  the  actual  predictions  in  the  file  casdata.  The  user 
may  find  it  convenient  to  rename  the  output  files  with  names  corresponding  to  the  program 
which  produce  them;  for  example,  replace  casdata  in  file  casdagli7.c  with  a  tile  named  cas7out, 
replace  casdata  in  file  casdaglil3.c  with  casl3out,  etc. 
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C.2  Main  Programs 

C.  2. 1  Grassberger  and  Procaccia  Algorithm. 

/*  This  program,  fdiinder4.c,  finds  the  fractal  dimension  of  a  time  series  xi 
whose  values  are  in  the  hie  cantvals.  It  implements  the  algorithm 
described  by  Grassberger  and  Procaccia  in  their  article  "Measuring  the 
Strangeness  of  Strange  Attractors."  This  program  was  written  by  Jim 
Stright  in  December  1992  and  is  an  adaptation  of  the  Ada  program  fractall 
which  he  wrote  in  1988.  Many  of  the  subroutines  are  taken  from  Press  et 
ah  "Numerical  Recipes  in  C."  See  closing  comments  for  output  format.*/ 

#include  <stdio.h> 

#include  <stdlib.h> 

#include  <math.h> 

float  *vector(); 
int  *ivector(); 
float  **matrix(); 

void  free.vector(),freeJvector(),free-matrix(); 

void  main(void) 

{ 

FILE  ♦fp;  /*  input  time  series  is  in  file  cantvals  */ 

int  D  =  5;  /*  nbr  of  elts  in  each  xi  vector  */ 

int  ymax  =  4000;  /*  an  upper  bound  for  the  nbr  of  D-tuples  */ 

float  distance; 
float  L0=  1.0; 

int  pts-onJine  =  8;  /*  nbr  of  pts  on  "line"  whose  slope  is  frac  dim  */ 

float  slope; 
int  sum; 
float  suml; 

int  y3=  1 ;  /*  counts  nbr  of  D-tupIes  */ 

int  zl=l ;  /*  counts  position  within  each  D-tuple  */ 

int  z3=0;  /*  counts  nbr  of  time  series  vals  in  input  file  */ 

int  i,j,q,r;  /*  misc.  counters  */ 

float  *L; 
int  *CL; 
float  **xi; 

L  =  vectorfl.pts-onJine); 

CL  =  ivector(l,pts_on_line); 
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xi  =  matrix(l,ymax,l,D); 

!*  open  cantvals  for  input  */ 
if  ((fp  =  fopen("cantvals","r"))  =  NULL)  { 
printf("Cannot  open  fileXn"); 
exit(l); 


/*  read  in  the  first  D-tuple  of  data  */ 
while  (zl  <=  D)  { 

fscanf(fp,  "%r,  &xi[y3][zl]); 
zl  =  zl  +  1; 
z3  =  z3  +  1 ; 

} 

zl  =  1; 
y3  =  y3  +  1; 

I*  read  in  all  subsequent  data  points  ♦/ 
while  ((!feof(fp))  &«&  (y3  <=  ymax))  { 
while  (zl  <  D)  { 

xi[y3][zl]  =  xi[y3-l][zl  +  l]; 
zl  =  zl  +  1; 

} 

fscanf(fp,  "%f',  &xi[y3][D]); 
z3  =  z3  +  1 ; 
zl  =  1; 
y3  =  y3  + 1; 

} 

y3  =  y3-l; 

I*  define  the  small  radii  L[i]  */ 
for  (i  =  0;  i  <=  pts_on_line  - 1 ;  i++)  { 

L[i  +  1]  =  exp(0.5*i  -  4.5);  /*  L[i  +  1]  =  exp(0.125*i  -  2.2);  */ 

} 

/*  find  the  CL[i]  values  */ 
for  (i  =  1 ;  i  <=  pts-onJine;  i++)  { 
sum  =  0; 

for(j  =  l;j<y3;j++)  { 

for  (q  =  j  +  1 ;  q  <=  y3;  q++)  { 
suml  =  0.0; 

for  (r  =  1;  r  <=  D;  r++)  { 
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suml  =  suml  +  pow(xi|j][r]  -  xi[q][r],2.0); 

} 

/*  suml  is  now  squared  distance  from  xilj]  to  xi[ql  *1 
distance  =  sqrt(suml); 
if  (distance  <  L[i]) 
sum  =  sum  +  1; 

} 

/*  sum  is  now  the  nbr  of  D-tuples  near  xi|jl  */ 

} 

/*  sum  is  now  the  nbr  of  D-tuples  within  L[i]  of  each  other  */ 

CL[i]  =  sum; 

} 

/*  Output  the  results  */ 

printfC'The  number  of  time  series  values  in  cantvals  is  %d  .\n\n",  z3); 
for  (i  =  1;  i  <=  ptS-onJine;  i++)  { 
printf(’’%3.3r,  log(L[i]/L0)); 
printf("  "); 

printf(’’%3.3f\n’‘,  log(CL[i])); 

} 

printfC'Nn”); 
if(CL[l]  =  0.0) 

printf("CL[l]  =  0;  make  L[l]  larger\n"); 
else  { 

/*  output  the  slopes  consecutively,  computed  from  the  first  point  */ 
for  (i  =  2;  i  <=  ptS-onJine;  i++)  { 

slope  =  (log(CL[il)-log(CL[ll))  /  (log(L[i]/L0)-log(L[l]/L0)); 
printf(”%3.Ji\n”,  slope); 

} 

/*  last  slope  output  is  best  "quick"  estimate  of  fractal  dimension  */ 

} 

free-vector(L,  1  ,pts  .on  Jine); 
free  Jvector(CL,  1  .pts.onJine); 
freejnatrixfxi,  1  ,ymax,  1  ,D); 
fclose(fp); 


/*  The  program’s  output,  should  all  go  well,  will  look  something  like  this: 
The  number  of  time  series  values  in  cantvals  is  105(X). 
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( these  are  the  coordinates  (ln(L[i]/LO,lnCL[i]) ) 

1.93  -2.97 

2.32  (slope  from  point  1  to  point  2) 

2.37  (slope  from  point  1  to  point  3) 


2.40  (slope  from  point  1  to  point  ptS-onJine)  */ 


C.2.2  DVS  Prediction. 

/*  This  program,  casdagli7.c,  implements  the  forecasting  algorithm  described 
on  page  307  of  Casdagli’s  article  "Chaos  and  Deterministic  versus 
Stochastic  Non-linear  Modelling."  It  was  written  in  May  1993  by  Jim 
Stright.  Casdagli7.c  also  provides  a  prediction  of  a  single  value  beyond 
the  end  of  the  data  used  for  testing  (an  unknown).  It  does  so  using  the 
best  m  &  k  as  found  from  previous  runs  of  this  program.  Usually  Nt=0  is 
used  in  the  prediction  mode,  with  Nf  the  number  of  time  series  values 
assumed  known.  Many  of  the  subroutines  are  taken  from  Press  et  al, 
"Numerical  Recipes  in  C."  As  a  predictor,  casdagli7.c  implements 
"DVS  prediction."  */ 

#include  <stdio.h> 

#include  <stdlib.h> 

#include  <math.h> 

#defineTINY1.0e-20 
double  *dvector(); 
int  *ivector(); 
double  **dmatrix(); 
double  momentO; 
double  ludcmpO; 
double  lubksbO; 
double  sort2(); 

void  free_dvector(),free_dmatrix(),freeJvector(); 


void  main(void) 


HLE  •fpl,  *fo2- 

/*  fpl  is  tsdata  (input);  fp2  is  casdata  ♦/ 

int  m=:14; 

/*  embedding  dim  */ 

inttau  =  1; 

/*  delay  time  */ 

int  T  =  tau; 

/♦  forecasting  time;  not  necessarily  tau!!  */ 

int  kbest  =  286; 

/*  replace  with  best  k,  else  use  2*(m+l)  */ 

int  ij,ctrl,ctr2,ctr3; 

/*  counters  */ 

int  row,col,l; 

/*  more  counters  */ 

int  k; 

/*  nbr  of  nearest  neighbors  */ 

int  klast  =  0; 

/*  This  counter  is  at  the  nbr  (+2*(m+l))  of  the 
last  nearest  neighbor  incorporated  in  the 

A  matrix  */ 

int  Nf=  2400; 

/*  nbr  of  time  series  values  in  fitting  set  */ 

int  Nt  =  270; 

/*  nbr  of  time  series  values  in  testing  set  */ 

int  Ns  =  1 ; 

/*  spacing  of  the  sampled  delay  vectors  */ 

int  n; 

/*  required  for  call  to  "moment"  */ 

int  FLAG  =  0; 

/*  used  in  ludcmp  for  "too  large"  check  */ 

int  kexp  =  0; 

/*  counter  for  exponential  spacing  of  k’s  */ 

double  kbase  =  2.0; 

/♦  base  for  exponential  spacing  of  k’s  */ 

double  avc,adev,sigma,svar, skew, curt; 

/♦  all  of  these  required  for  call  to  "moment", 
although  only  sigma  is  used  in 
casdagli7.c;  see  Press  Ed  2,  p.613  *! 
double  *ave-ptr=&ave,*adev_ptr=&adev,*sigma_ptr=&sigma; 
double  * s var_ptr=&s var, ♦skew _ptr=&skew,*curt-ptr=&curt ; 
double  *x; 

double  **A,**Alud,**d,*dhold,*alpha,*b,dnr; 

/*  Alud  is  repeatedly  destroyed  by  ludcmp,  dnr  is  Press’s  d,  p.46  ♦/ 
int  *indx; 
int  *nbrtested; 

double  **xhat,**e,*Em,errsum; 

X  =  dvector(l,Nf+Nt); 
indx  =  ivector(  1  ,m+ 1 ); 

nbrtested  =  ivector(0,Nf-T-(m-l)*tau-2*(m+l)); 

A  =  dmatrix(l,m+l,l,m+l); 

Alud  =  dmatrix(l,m+l,l,m+l); 
d  =  dmatrix(Nf,Nf+Nt,l,Nf-T-(m-l)*tau); 

/*  d[i][  1  ]  is  the  distance  from  vector  x[i]  to  vector  x[  1  +(m- 1  )*tau]; 
d[i][2]  is  the  distance  from  vector  x[ij  to  vector  x[l+(m-l  )*tau+l  ]; 

d[i][Nf-T-(m-l)*tau]  is  distance  from  vector  x[i]  to  vector  x[Nf-T], 
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before  a  swap  for  nearness  is  performed.  */ 
dhold  =  dvector(l,Nf-T-(ra-l)*tau); 
alpha  =  dvector(l,m+l); 
b  =  dvector(l,m+l); 

xhat  =  dmatrix(0,Nf-T-(m-l)*tau-2*(m+l),Nf+T,Nf+Nt+T); 
e  =  dniatrix(0,Nf-T-(m-l)*tau-2*(m+l).Nf,Nf+Nt); 

Era  =  dvector(0,Nf-T-(ra-l)*tau-2*(m+l)); 

I*  open  tsdata  for  input  */ 
if  ((fpl  =  fopen("tsdata","r"))  =  NULL)  { 
printfC'Cannot  open  file  lsdata\n"); 
exit(l); 

} 

/♦  open  casdata  for  output  */ 
if  ((fp2  =  fopenC’casdata",  "w"))  ==  NULL)  { 
printf("Cannot  open  file  casdataNn"); 
exit(l); 

} 


/•  read  in  the  time  series  data  ♦/ 
for  (ctrl=l;  ctrl<=Nf+Nt;  ctrl++)  { 
fscanf(fpl,  "%lf’,  &x[ctrl]); 

} 

/*  compute  distances  d[i][j]  and  load  d  matrix  with  nearness  indices  */ 
for  (i=Nf;  i<=Nf+Nt;  i++)  { 

for  (j=l  ;  j<=Nf-T-(m-l)*tau;  j++)  {  I*  see  indexing  note  below  */ 
d[i]Ij]  =  fabs(x[i]  -  x[j+(m-l)*tau]); 
for  (ctrl=tau;  ctrl<=(m-l)*tau;  ctrl=ctrl+tau)  { 
if  (fabsfx[i-ctrl]-xlj+(m-l)*tau-ctrl])  >  d[i]|j])  { 
d[ijlj]  =  fabs(x[i-ctrl]-xlj+(m-l)*tau-ctrl]); 

} 

} 

/*  dist  d[i]Ij]  between  vctrs  x[i]  &  x[j+(m-l)*tau]  is  fixed  */ 

} 

/*  the  distances  d[i]|j]  are  now  established  for  all  j  ♦/ 

/*  initialize  the  index-swap  vector  dhold  */ 
for  (ctr2=l ;  ctr2<=Nf-T-(m-l)*tau;  ctr2++)  { 
dhold[ctr2]  =  ctr2  +  (m-l)*tau; 
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t*  now  the  contents  of  (lhold[ll,  eg,  is  l+(m-l)*taii  *! 

} 

/*  Sort  the  contents  of  the  vector  d[i]  and  simultaneously  sort  the 
vector  dhold  into  ascending  order  of  nearness  of  vectors  to  x[i]; 
see  Press  Ed  2,  page  334.  */ 

sort2(Nf-T-(m-l)*tau,  d[i].  dhold); 

/*  replace  contents  of  vector  d[i]  with  indices  of  vectors  arranged 
in  ascending  order  of  nearness  to  x[i]  */ 

for  (ctr3=l;  ctr3<=Nf-T-(m-l)*tau;  ctr3++)  { 
d[i][ctr3]  =  dhold[ctr3]; 

} 

/♦  Now  the  contents  of  vector  d[i]  is  the  set  of  indices  of  vectors 
compared  for  nearness  to  vector  x[i],  arranged  in  ascending  order 
of  nearness  to  x[i].  eg,  d[i][ll  is  the  index  of  the  vctr 
nearest  x[i].  */ 


/*  Find  standard  dev  sigma  for  the  time  series;  see  Press  2,  p.613.*/ 
n  =  Nf+Nt; 

moment(x,n,ave_ptr,adev-ptr,sigma_ptr,svar-ptr,skew_ptr,curt_ptr); 

fprintf(fp2,"Data  output  from  program  casdagli7.c\n"); 

fprintf(fp2,"m=%2d  \  n",m); 

fprintf(fp2,"Nf  =  %d\n",Nf); 

fprintf(fp2,"Nt  =  %d\n",Nt); 

fprintf(fp2,"T  =  tau  =  %d\n",T); 

fprintf(fp2,"average  data  value  ave  =  %2.5f\n",ave); 

fprintf(fp2,"daia  standard  deviation  sigma  =  %2.5f\n",sigma); 

/*  Initialize  the  max  nbr  of  vectors  to  be  compared  for  nearness  */ 
k  =  0; 
kexp  =  0; 

while  (k  <=  Nf-T-(m- 1  )*tau-2*(m+ 1 ))  { 

nbrtested[k]  =  (Nt-T+1)/Ns;  /*  recall  int  division  trucates  */ 
k  =  (int)  pow(kbase,kexp); 
kexp  =  kexp  +  1; 

} 

I*  Establish  the  error  matrix  e[k][i]  */ 
for(i=Nf;i<=Nf+Nt;  i++)  { 
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J*  Initialize  the  A  matrix  at  k=2*(m+l)  ♦/ 
t*  First  the  diagonal  entries:  *1 
Alll[l]  =  2*(m+1); 

Alud[l][ll  =  A[l][l]; 
for(ctrl=2;  ctrl<=(m+l);  ctrl++)  { 
A[ctrl][ctrl]  =  0.0; 
for  (ctr2=l;  ctr2<=2*(m+l); cti2++)  { 
A[ctrl][ctrl]  =  A[ctrl][ctrll 
+  x[(intXd[i][ctr2])-(ctrl-2)*taul 
*  x[(intXd[i][ctr2])-(ctrl-2)*taul; 

} 

Alud[ctrll[ctrl]  =  A[ctrll[ctrll; 


/*  Now  the  first  row  (and  first  column)  entries:  */ 
for  (col=2;  col<=(m+l);  col++)  { 

A[l][col]  =  0.0; 

for  (1=1;  l<=2*(m+I);  1++)  { 

A[l][col]  =  A[l][col]  +  x[(intXd[i][l])-(col-2)*tau]; 

} 

Alud[l][coll  =  A[l][col]; 

Alcollil]  =  Atl][coll; 

Alud[coll[l]  =  A(col][ll; 


/*  Now  initialize  the  off-diag,  off-first-row-or-col  entries:  */ 
for  (row=2;  row<=m;  row++)  { 

for  (col=row+l;  col<=(m+l);  col++)  { 

A[row][col]  =  0.0; 

for  (1=1;  l<=2*(m+l);  1++)  { 

A[row][col]  =  A[row][col] 

+  x[(intXd[i]Il])-(row-2)*taul 
*  x[(intXd[il[ll)-(col-2)*tau]; 

} 

Alud[row][col]  =  A[row][col]; 

A[coll[row]  =  A[row][coIl; 

Alud[col][rowl  =  A[coll[row]; 

} 

} 

I*  And  last,  initialize  the  b  vector,  and  its  equal  alpha  vector; 
alpha  gets  replaced  with  the  proper  solution  when  one  solves 
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A*alpha  =  b  as  A*x=b=alpha;  see  Press,  page  44.  */ 
b(ll  =  0.0; 

for  (1=1;  l<=2*(m+l);  1=1+1)  { 
b[l]  =  b[l]  +  x[(intXd[i]Il])+Tl; 

} 

alphain  =  b[l]; 

for  (row=2;  row<=(m+l);  row++)  { 
b[row]  =  0.0; 

for  (1=1;  l<=2*(m+l);  1++)  { 
b[row]  =  b[row] 

+  x[(intXd[i][l])-(row-2)*tau] 

*  x[(mtXd[iI[ll)+Tl; 

} 

alpha[row]  =  b[row]; 

} 

/*  Go  after  the  e[kl[i];  may  easily  change  the  k  indexing  to  sample 
k’s  at  exponentially  spaced  intervals.  In  what  follows,  k 
equals  the  nbr  of  neighbors  nearest  xfi]  minus  2*(m+l).  */ 

klast  =  0; 
k=0; 

kexp  =  0; 

while  (k  <=  Nf-T.(m-l)*tau-2*(m+l))  { 

/*  Nf-T-(m-l)*tau  is  the  nbr  of  nghbrs  in  fitting  set  whose 
"predicted"  value  is  <=  Nf  */ 

/*  Update  the  A  matrix  */ 

/*  First  update  the  diagonal  entries  */ 

A[l](l]  =  2*(m+l)  +  k; 

Alud[l][l]  =  A[l][ll; 
for  (ctrl=2;  ctrl<=(m+l);  ctrl++)  { 
for  (ctr2=l ;  ctr2<=(k-klast);  ctr2++)  { 

A[ctrl][ctrl]  =  A[ctrl][ctrl] 

+  x[(intXd[i][2*(m+ 1  )+klast+ctr2])-(ctrl  -2)*tau] 

*  x[(intXd[i]  [2*(m+ 1  )+klast+ctr2])-(ctr  1  -2)*tau] ; 

} 

Alud[ctrl][ctrl]  =  A[ctrl][ctrl]; 

} 
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I*  Now  update  the  first  row  (and  first  column)  entries:  *! 
for  (col=2;  col<=(m+l);  col++)  { 
for  (1=1;  l<=(k-klast);  1++)  { 

A[l][coll  =  A[ll[col] 

+  x[(intXd[i][2*(m+l  )+klast+l])-(col-2)*tau]; 

} 

Alud[l][colI  =  A[l][coll; 

A[col][l]  =  A[ll[col]; 

Alud[col][ll  =  A[col][ll; 


I*  Now  update  the  off-diag,  off-first-row-or-col  entries:  */ 
for  (row=2;  row<=m;  row++)  { 

for  (col=row+l;  col<=(m+l);  col++)  { 
for(l=l;l<=(k-klast);l++){ 

A[row][col]  =  A[rowl[coll 

+  x[(intXd[i]  [2*(m+ 1  )+klast+ll)-(row-2)*tau] 

*x[(intXd[i][2*(m+l)+klast+l])-(col-2)*tau]; 

} 

Alud[rowl(col]  =  A[rowl[coll; 

A[col][row]  =  A[row][col]; 

Alud[colHrowl  =  A[coll[row]; 

} 

} 

t*  Finally,  update  the  b  vector:  */ 
for  (1=1;  l<=(k-klast);  1++)  { 

b[l]  =  b[l]  +  x[(intXd[i][2*(m+l)+klast+l))+T]; 

} 

alpha[l]  =b[l]; 

for  (row=2;  row<=(m+l);  row++)  { 
for(l=l;l<=(k-klast);l++){ 
b[row]  =  b[row] 

+  x[(intXd[i][2*(m+l)+klast+l])-(row-2)*tau] 
*x[(intXd[il[2*(m+l)+klast+l])+T]; 

} 

alpha[row]  =  b[row]; 


klast  =  k; 

/*  Solve  the  normal  eqtns  for  alpha[l]  thru  alpha[m+l]  *! 
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ludcmp(  Alud,FLAG,m+ 1  ,indx,&dnr); 

if(FLAG=l){ 

alpha[ll  =  1000001; 

FLAG=0;} 
else  { 

lubksb(  Alud,m+ 1  ,indx,alpha); 

} 

/*  alpha[l],alpha[2],...  are  now  optimum  in  Casdagli’s  eqtn  S,  if 
the  normal  equations  admit  a  solution.  Otherwise  set  alpha[l] 
=  x[i+T],  alpha[2]=alpha[3]=...=alphalm+l]=0,  so  that 
xhat[k][i+T]  =  x[i+T],  the  exact  data  value.  Also,  decrement 
nbrtested  so  this  unusual  event  isn’t  included  in  Em[k].  *! 
for  (ctrl=l;  ctrl<=(m+l);  ctrl++)  { 
if  ((fabs(alphalctrl])  >  1000000)11 
(alpha[ctrl]  =  HUGE_VAL))  { 
nbrtested[k]  =  nbrtested[k]-l; 
fprintf(fp2, "Error  at  1  \n"); 
alpha[l]  =  x[i+T]; 
for  (ctr2=2;  ctr2<=(m+l);ctr2++)  { 
alpha[ctr2]  =  0.0; 

} 

break; 

} 

} 

xhat[k][i+T]  =  alpha[l]; 

for  (ctrl=2;  ctrl<=(m+l);  ctrl++)  { 

xhat[k][i+T]  =  xhat[k][i+T]  +  alpha[ctrl]*x[i-(ctrl-2)*tau]; 

} 

I*  xhat[k][i+T]  has  now  been  established  */ 
if(i<=Nf+Nt-T) 

e[k][i]  =  fabs(xhat[k][i-«-T]  -  x[i+T]); 
else 

e[k][i]  =  0.0; 

k  =  (int)  pow(kbase,kexp); 
kexp  =  kexp+  1; 

}  /*  closes  the  loop  over  k’s  */ 

}  I*  closes  the  loop  over  i’s  •/ 


101 


k  =  0; 
kexp  =  0; 

whUe  (k  <=  Nf-T-(m-l)*tau-2*(m+l))  { 
emum  =  0.0; 

for  (i=Nf;  i<=Nf+Nt-T;  i++)  { 

errsum  =  errsum  +  e[k][i]*e[k][i]; 

} 

Em[k]  =  (sqrt(errsum/Qbrtested[k]))/sigma; 

fprintf(fp2,  "%d",  2*(m+l)+k);  /*  Output  nbr  of  nearest  nghbrs  */ 
fprmtf(fp2,"  "); 

fprintf(fp2,  "%1.6f\n'',  Ein[k]);  /*  Output  forecasting  error  */ 

k  =  (int)  pow(kbase,kexp); 
kexp  =  kexp  +  1; 


fprintf(fp2,  "Predicted  value  at  time  %d  is  %f\n", 

Nf+Nt+T,xhat[kbest-2*(m+I)l[Nf+Nt+T]); 

freejdvector(x,l  ,Nf+Nt); 

freejdvector(dhold,l  ,Nf-T-(m-l  )*tau); 

free  jdvector(alpha,  1  ,m+ 1 ); 

free  jdvector(b,  1  ,m+ 1 ); 

freejdvector(Em,0,Nf-T-(m- 1  )*tau-2*(m+ 1 )); 

free  Jvector(indx,  1  ,m+ 1 ); 

free  J  vector(nbrtested,0,Nf-T-(m- 1  )*tau-2  *(m+ 1 )); 

free  jdmatrix(  A,  1  .in+ 1 , 1  ,ni+ 1 ); 

free  jdn]atrix(  Alud,  1  ,m+ 1 , 1  ,ni+ 1 ); 

frcejdmatrix(dJ^f,Nf+Nt-T,ld^f-T-(m-l)*tau); 

freejdmatrix(xhat,0,Nf-T-(m-l  )*tau-2*(m-f  1  ),Nf+T,Nf+Nt); 

freejdinatrix(e,0,Nf-T-(m-l  )*tau-2*(m+l  ),Nf,Nf+Nt-T); 

fclose(fpl); 

fclose(fp2); 


C.2.S  Pruned  Outliers  Prediction. 

/*  This  program,  casdaglil 3.c,  adapts  the  forecasting  algorithm  described 
on  p.  307  of  Casdagli’s  article  "Chaos  and  Deterministic  versus  Stochastic 
Non-linear  Modelling."  Written  Nov  1993  by  Jim  Stright,  it  is  a 
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modification  of  program  casdagli7.c.  CasdaglilS.c  provides  a  prediction  of 
a  single  value  beyond  the  end  of  the  data  used  for  testing  (an  unknown).  It 
does  so  using  the  best  k  (kbest)  as  found  from  a  previous  run  of  program 
casdagli7.c.  Rather  than  use  all  kbest  nearest  nghbrs,  it  uses  only  those 
p  of  them  which  are  close  to  the  hyperplane  linear  regression  best  fit  of 
the  kbest.  Fix  Nt  =  0  in  this  program;  there  are  Nf  known  time  series 
values.  Many  of  the  subroutines  are  taken  from  the  book  by  Press  et  al, 
"Numerical  Recipes  in  C."  Casdaglil3  c  implements  "pmned  outliers 
prediction."  */ 

#include  <stdio.h> 

#include  <stdlib.h> 

#include  <math.h> 

#defineTINY1.0e-20 
double  ♦dvectorO; 
int  *ivector(); 
double  **dmatrix(); 
double  momentO; 
double  ludcmpO; 
double  lubksbO; 
double  sort2(); 

void  free-dvector(),free_dmatrix(),freeJvector(); 
void  main(void) 


FILE  *fpl,  *fp2; 

/*  fpl  is  tsdata  (input);  fp2  is  casdata  */ 

int  m=14; 

/*  embedding  dim  */ 

int  tau=l; 

/*  delay  time  *! 

int  i,j,ctrl,ctr2,ctr3; 

1*  counters  */ 

int  row,col,l.q; 

/*  more  counters  */ 

int  p; 

/*  nbr  of  nearest  nghbrs  close  to  l.reg.  line  */ 

int  k; 

/*  ctr  of  nbr  of  nearest  neighbors  */ 

int  kbest  =  286; 

/*  from  casdagli7.c’s  output  */ 

int  klast  =  0; 

/*  This  counter  is  at  the  nbr  (+2*(m+l))  of  the 
last  nearest  neighbor  incorporated  in  the 

A  matrix  */ 

int  Nf=  2670; 

/*  nbr  of  time  series  values  in  fitting  set  */ 

int  Nt  =  0; 

1*  nbr  of  time  series  values  in  testing  set  */ 

int  Ns  =  1; 

1*  spacing  of  the  sampled  delay  vectors  */ 

int  T  =  tau; 

/*  forecasting  time;  not  necces.  tau!!  */ 

int  n; 

/*  required  for  call  to  "moment"  */ 

int  FLAG  =  0; 

/*  used  in  ludcmp  for  "too  large"  check  */ 
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int  kexp  =  0; 
double  kbase  =  2.0; 
double  sig; 
double  gamma  =  2.0; 


t*  counter  for  exponential  spacing  of  k’s  */ 
/*  base  for  exponential  spacing  of  k’s  */ 

I*  a  measure  of  nearness  to  lin.regr.  line  *! 
/♦  may  want  to  try  eg  gamma  =  3.0  */ 


double  ave,adcv,sigma,svar,skew,curt; 

/*  all  of  these  required  for  call  to  "moment", 
although  only  sigma  and  ave  used  in 
casdaglil3.c;  see  Press  Ed  2,  p.613  */ 
double  *ave_ptr=&ave,*adev_ptr=&adev,*sigma-ptr=&sigma; 
double  *svar_ptrs:&svar,*skew4)tr^«&:skew,*curt-pti:^&curt; 
double  *x,*xlr,*er; 


double  **A,**Alud,**d,*dhold,*alpha,*b,dnr; 

/*  Alud  is  repeatedly  destroyed  by  ludcmp,  dnr  is  Press’s  d,  p.46  *! 
int  *indx; 
int  *nbrtested; 

int  *dn;  /*  used  to  index  nearest  nghbrs  close  to  lin.  reg.  line  */ 
double  **xhat,**e,*Em,errsum; 

X  =  dvector(l,Nf+Nt+T); 
xlr  =  dvector(l,kbest); 
er  =  dvector(l,kbest); 
indx  =  ivector(l,m+l); 

nbrtested  =  ivector(0,Nf-T-(m-l)*tau-2*(m+l)); 
dn  =  ivector(l,kbest); 

A  =  dmatrix(l,m+l,l,m+l); 

Alud  =  dmatrix(  1  ,m+ 1 , 1  ,m+ 1 ); 
d  =  dmatrix(Nf,Nf+Nt,l,Nf-T-(m-l)’'‘tau); 

/*  d[i]  [  1  ]  is  the  distance  from  vector  x[il  to  vector  x[  1  +(m- 1  )*tau] ; 
d[i][21  is  the  distance  from  vector  x[i]  to  vector  x[l+(m-l)*tau+l]; 

d[i][Nf-T-(m-l)*tau]  is  distance  from  vector  x[i]  to  vector  x[Nf-T], 
before  a  swap  for  nearness  is  performed.  */ 
dhold  =  dvector(l,Nf-T-(m-l)*tau); 
alpha  =  dvector(l,m+l); 
b  =  dvector(l,m+l); 

xhat  =  dmatrix(0,Nf-T-(m-l)*tau-2*(m+l),Nf+T,Nf+Nt+T); 
e  =  dmatrix(0,Nf-T-(m-l)*tau-2*(m+l),Nf,Nf+Nt); 

Em  =  dvector(0,Nf-T-(m-l)*tau-2*(m+l)); 


/*  open  tsdata  for  input  */ 
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if  ((fpl  =  fopen("tsdata","r"))  =  NULL)  { 
printf("Caimot  open  file  tsdataXn"); 
exit(l); 


I*  open  casdata  for  output  *! 
if  ((fp2  =  fopen("casdata",  "w"))  ==  NULL)  { 
printfC’Cannot  open  file  casdata\n”); 
exit(l); 


I*  read  in  the  time  series  data  */ 
for  (ctrl=l;  ctrl<=Nf+Nt+T;  ctrl++)  { 
fscanf(fpl,  "%lf',  &x[ctrl]); 

} 

!*  compute  distances  d[i](j]  and  load  d  matrix  with  nearness  indices  *! 
for  (i=Nf;  i<=Nf+Nt;  i++)  { 

for(j=l;  j<=Nf-T-(m-l)*tau;  j++)  {  /*  see  indexing  note  below  */ 
d[i][j]  =  fabs(x[i]  -  x|j+(m-l)*tau]); 
for  (ctrl=:tau;  ctrl<=(m-l)*tau;  ctrl=ctrl+tau)  { 
if  (fabs(x[i-ctrll-xlj+(m-l)*tau-ctrl])  >  d[i]|j])  { 
cl[i]|j]  =  fabs(x[i-ctrl]-xlj+(m-l)*tau-ctrll); 

} 

} 

/*  dist  d[i]lj]  between  vctrs  x[i]  &  x[j+(m-l)*tau]  is  fixed  */ 

} 

/*  the  distances  d[i]Ij]  are  now  established  for  all  j  */ 

/*  initialize  the  index-swap  vector  dhold  */ 
for  (ctr2=l;  ctr2<=Nf-T-(m-l)*tau;  ctr2++)  { 
dhold[ctr2]  =  ctr2  +  (m-l)*tau; 

/♦  now  the  contents  of  dhold[l],  eg,  is  l+(m-l)*tau  */ 

} 

I*  Sort  the  contents  of  the  vector  d[i]  and  simultaneously  sort  the 
vector  dhold  into  ascending  order  of  nearness  of  vectors  to  x[il; 
see  Press  Ed  2,  page  334.  */ 
sort2(Nf-T-(m-l)*tau,  d[i],  dhold); 

/*  replace  contents  of  vector  d[i]  with  indices  of  vectors  arranged 
in  ascending  order  of  nearness  to  x[i]  *! 
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for  (ct*-^=l;  ctr3<=Nf-T-(m-l)*tau;  ctr3++)  { 
d[i][ctr3]  =  dhold[ctr3]; 

} 

I*  Now  the  contents  of  vector  d[i]  is  the  set  of  indices  of  vectors 
compared  for  nearness  to  vector  x[i],  arranged  in  ascending  order 
of  nearness  to  x[i];  eg,  d[i][l]  is  index  of  vctr  nearest  x[i].  */ 

} 

I*  Find  standard  dev  sigma  for  the  time  series;  see  Press  2,  page  613.*/ 
n  =  Nf+Nt; 

moment(x,n,ave-ptr,adev_ptr,sigma_ptr,svar^tr,skew^tr,curt_ptr); 

fprintf(fp2,’'Data  output  from  program  casdaglil3.c\n"); 

fprintf(fp2,"m=%2d  \  n",m); 

fprintf(fp2,"Nf  =  %d\n",N0; 

fprintf(fp2,"Nt=  %d\n",Nt); 

fprintf(fp2."T  =  tau  -  %d  \n",T); 

fprintf(fp2,"gamma  =  %f\n", gamma); 

fprintf(fp2,"Global  optim  nbr  nearest  nghbrs  kbest=%3d\n",kbest); 
fprintf(fp2,"average  data  value  ave  =  %2.5f\n",ave); 
fprintf(fp2,"data  standard  deviation  sigma  =  %2.5f\n", sigma); 

i  =  Nf+Nt;  /♦  i  remains  fixed  now  at  the  last  known  ts  value  */ 

/*  Initialize  the  A  matrix  at  k=kbest  */ 

/*  First  the  diagonal  entries:  */ 

A[l][l]  =  kbest; 

Alud[l][l]  =  A[l][l]; 

for  (Ctrl  =2;  ctrl<=(m+l);  ctrl++)  { 

A[ctrll[ctrl]  =  0.0; 

for  (ctr2=l;  cti2<=kbest;  ctr2++)  { 

A[ctrl]Ictrl]  =  A[ctrl][ctrl] 

+  x[(int)(d[i][ctr2])-(ctrl-2)*tau] 

*  x((int)(d[i][cti2])-(ctr  1 -2)*tau]; 

} 

Alud[ctrl][ctrl]  =  A[ctrl][ctrl]; 

} 

/*  Now  the  first  row  (and  first  column)  entries:  */ 
for  (col=2;  col<=(ni+l);  col++)  { 

A[l][coll  =  0.0; 

for  (1=1;  l<=kbest;  1++)  { 
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A[ll[coll  =  A[ll[coll  +  x[(int)(d[il[ll)-(col-2)*tau]; 

} 

Alud[l][col]  =  A[l][col]; 

A[col][l]  =  AIULcol]; 

Alud[col][l]  =  A[coll[lI; 

} 

/*  Now  initialize  the  off-diag,  off-first-row-or-col  entries;  */ 
for  (row=2;  row<=m;  row++)  { 

for  (col=row+l;  col<=(m+l);  col++)  { 

A[row][col]  =  0.0; 
for  (1=1;  l<=kbest;  1++)  { 

A[row][col]  =  A[row][col] 

+  x[(intXd[i][l])-(row-2)*tau] 

*  x[(intXd[il[ll)-(col-2)*tau]; 

} 

Alud(row][col]  =  A[row][col]; 

A(col][row]  =  A[row][col]; 

Alud[col][row]  =  A[col][row]; 

} 

} 

/*  And  last,  initialize  the  b  vector,  and  its  equal  alpha  vector; 

alpha  gets  replaced  with  the  proper  solution  when  A*alpha  =  b  is 
solved  as  A*x=b=alpha;  see  Press,  page  44.  */ 
b[l]  =  0.0; 

for  (1=1 ;  l<=kbest;  1++)  { 

b[l]  =  b[l]  +  x[(intXd[i][ll)+T]; 

} 

alpha[l]  =  b[l]; 

for  (row=2;  row<=(m+l);  row++)  { 
b[row]  =  0.0; 
for  (1=1;  l<=kbest;  1++)  { 
b[rowl  =  b[row] 

+  x[(intXd[i][l])-(row-2)*taul 
*x((intXdfil[l])+T]; 

} 

alpha[row]  =  b[row]; 

} 

k  =  kbest  -  2*(m+l); 

/*  Solve  the  normal  eqtns  for  alpha[l]  thru  alphaIm+1]  */ 
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ludcmp(Alud,FLAG,m+ 1  ,indx,&dnr); 

if(FLAG=l){ 

alpha[l]  =  1000001; 

FLAG=0;} 
else  { 

lubksb(Alud,m+ 1  ,indx,alpha); 

} 

/*  alpha[l],alpha[2],...  are  now  optimum  in  Casdagli’s  eqtn  S,  if 
the  normal  equations  admit  a  solution.  Otherwise  set  alpha[l] 
=  x[i+Tl,  alpha[2]=alpha[3]=...=alpha[m+l]=0,  so  that 
xhat[k][i+T]  =  x[i+T],  the  exact  data  value.  Also,  decrement 
nbrtested  so  this  unusual  event  isn’t  included  in  Em[k].  *! 

for  (ctrl=l;  ctrl<=(m+l);  ctrl++)  { 
if  ((fabs(alpha[ctrll)  >  1000000)11 
(alphalctrl]  =  HUGE.VAL))  { 
nbrtested[k]  =  nbrtested[k]-l; 
alpha!  1]  =  x[i+T]; 
for  (ctr2=2;  ctr2<=:(m+l);ctr2++)  { 
alpha[ctr2]  =  0.0; 

} 

break; 

} 

} 

for  (q=l;  q  <=  kbest;  q++)  { 
xlr[q]  =  alpha[l]; 

for  (Ctrl  =2;  Ctrl  <=  (m+1);  ctrl++)  { 
xlr[q]  =  xlr[ql  +  alphalctrl]  * 

x[(int)(d[i][q])  -  (ctrl-2)*tau]; 

} 

} 

/*  The  value  of  each  of  the  nearest  nghbrs  x[(int)(d[i][q])]  of 
x[i]  on  the  linear  regression  hyperplane  is  now  established 
as  xlrlqj.  */ 

for  (q=l ;  q  <=  kbest;  q++)  { 

er[q]  =  fabs(xlr[q]  -  x[(intXd[i][q])+TJ); 

} 
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I*  The  error  er[q]  between  the  next  value  of  the  time  series 
corresponding  to  nghbr  q  and  the  value  of  its  linear 
regression  is  now  established  for  all  q.  *t 

moinent(er,kbest,ave_ptr,adev_ptr,sigma-ptr,svar_ptr,skew-ptr, 

curt-ptr); 

/*  Now  sigma  is  the  standard  deviation  of  the  er[q],  and  ave 
is  the  average  value  of  the  er[q].  */ 

sig  =  gamma*sigina;  t*  may  want  to  try,  eg,  sig=2.0*sigma  */ 

P=l; 

q=l; 

while  (q  <=  kbest)  { 
if  (er[q]  <  ave  +  sig)  { 
dn[p]  =  (intXd[il[q]); 

P  =  P+1; 

} 

q  =  q+l; 

} 

p  =  p-l; 

I*  Now  the  p  nearest  nghbrs  to  vctr  x[Nf+Nt]  (p<=kbest)  with  next 
vals  within  ave+sig  of  the  regression  hyperplane  determined  by 
the  kbest  nearest  nghbrs  to  vctr  xINf+Nt]  are  the  vectors 
x[dn[l]],x[dn[2]],...,x[dn[p]]  (given  in  descending  order 
of  nearness;  eg,  x[dn[l]]  is  nearest  x[Nf+Nt]).  */ 

if(p<2*(m+l)){ 

fprintf(fp2,"Too  few  near  with  p=%2d;  INCREASE  GAMMA !\n",p); 

} 

else  { 

I*  Reestablish  the  A  matrix  at  k=p.  */ 

/*  First  the  diagonal  entries:  */ 

A[ll(ll  =  p; 

Alud[l][l]  =  A[l][l]; 

for  (ctrl=2;  ctrl<=(m+l);  ctrl++)  { 

A[ctrl][ctrl]  =  0.0; 

for  (ctr2=l ;  ctr2<=p;  ctr2++)  { 

A[ctrl][ctrl]  =  AIctrl][ctrl] 

+  x[dn[ctr2]-(ctrl-2)*tau] 

•  x[dn[ctr2]-(ctrl-2)*tau]; 

} 
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Alud[ctrl][ctrl]  =  A[ctrl][ctrll; 


} 

/*  Now  the  first  row  (and  first  column)  entries:  */ 
for  (col=2;  col<=(m+l);  col++)  { 

A[ll[coll  =  0.0; 
for  (1=1;  l<=p;l++)  { 

A[l][col]  =  A[l][col]  +  x[dn[l]-(col-2)*taul; 

} 

Alud[ll[col]  =  A[ll[coll; 

A[col][l]  =  A[l][col]; 

Alud[col][l]  =  A(col][l]; 


/*  Now  init.  the  off-diag,  off-first-row-or-col  entries;  */ 
for  (row=2;  row<=m;  row++)  { 

for  (col=row+l;  col<=(m+l);  col++)  { 

A[row][col]  =  0.0; 
for(l=l;l<=p;l++)  { 

A[row][col]  =  A[row][col] 

+  x[dn[ll-(row-2)*tau] 

♦  x(dn[l]-(col-2)*tau]; 

} 

Alud[rowl[col]  =  A[row](col]; 

A[col][row]  =  A[rowj[col]; 

Alud[col][row]  =  A[col][row]; 

} 

} 


/*  And  last,  init.  the  b  vctr,  and  its  equal  alpha  vector; 
alpha  gets  replaced  with  the  right  sol.  when  one  solves 
A*alpha  =  b  as  A*x=b=alpha;  see  Press,  page  44.  ♦/ 
b[ll  =  0.0; 
for  (1=1 ;  l<=p;  1++)  { 

b[l]  =  b[lI  +  x[dn[ll+T]; 

} 

alpha[l]  =  b[l]; 

for(row=2;  row<=(m+l);  row++)  { 
b[row]  =  0.0; 
for  (1=1;  l<=p;  1++)  { 
b[row]  =  b[row] 

+  x[dn[l]-(row-2)*taul 
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*  x[dn[ll+T]; 


} 

alpha[row]  =  b[row]; 

} 

} 

/*  Solve  the  normal  eqtns  for  alpha[ll  thru  alpha[m-i-l]  */ 

ludcmp(  Alud,FLAG,in+ 1  ,indx,&dar); 

if(FLAG=l){ 

alphalU  =  1000001; 

FLAG=0;} 
else  { 

lubksb(  Alud,m-i- 1  ,indx,alpha); 


/*  alpha!  l],alpha[2],...  are  now  optimum  in  Casdagli’s  eqtn  S,  if 
the  normal  equations  admit  a  solution.  Otherwise  set  alpha!  1] 
=  x!i+T],  alpha!21=alpha!3]=...=alpha!m+l]=0,  so  that 
xhat!k]!kT]  =  x!i+T|,  the  exact  data  value.  Also,  decrement 
nbrtested  so  this  unusual  event  isn’t  included  in  Em!k].  *! 

for(ctrl=l;ctrl<=(m+l);  ctrl++)  { 
if((fabs(alpha!cul])>  1000000)11 
(alpha!ctrll  =  HUGE.VAL))  { 
nbrtested!k]  =  nbrtested!k]-l; 
alpha!l]  =  x!i+T|; 
for  (ctr2=2;  ctr2<=(m+l);ctr2++)  { 
alpha!ctr2]  =  0.0; 

} 

break; 

} 

} 

xhat!k]!i-»-T]  =  alpha!!]; 

for  (ctrl=2;  ctrl<=(m+l);  ctrl++)  { 

xhat!k]!i+T]  =  xhat!k]!i+T]  +  alpha!ctrl]*x!i-(ctrl-2)*tau]; 

} 

/*  xhat!k]!i-fT]  has  now  been  established;  it  is  the  predicted  ts 
value  at  time  i+T  =  Nf+Nt+T.  */ 
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fprintf(fip2,  "lin.  regr.  hyperplane  errors  ave  =  %3.5f\n",ave); 
fprintf(fp2,  "nbr  nghbrs  close  to  lin.  regr.  hyperplane  =  p  =  %2d\n",p); 
fprintf(fip2,  "The  predicted  value  at  time  %d  is  %f\n", 

Nf+Nt+T,xhat[kbest-2*(m+l)][Nf+Nt+T]); 
fprintf(fp2,  "The  actual  value  at  time  %d  is  %f\n"JsIf+Nt+Xx[Nf+Nt+T]); 

free  jdvector(x,  1  ,Nf+Nt+T); 

free  jdvector(xlr,  1  Jcbest); 

free  jdvector(er,  1  ,kbest); 

free  jdvector(dhold,  1  ,Nf-T-(m- 1  )*tau); 

freejdvector(alpha,l  ,m+l ); 

freejdvector(b,  1  ,m+ 1 ); 

free  jdvector(Em,0,Nf-T-(m- 1  )*tau-2*(m+l )); 

free  Jvector(indx,  1  ,m+ 1 ); 

free  jvector(nbrtested,0,Nf-T-(m- 1  )*tau-2*(m+ 1 )); 

free  Jvector(dn,  1  ,kbest); 

free  jdmatrix(  A,  1  ,m+ 1 , 1  ,m+ 1 ); 

free  jdmatrix(Alud,  1  ,m+ 1 , 1  ,m+ 1 ); 

freejdmatrix(d,Nf,Nf+Nt-T,  1  ,Nf-T-(m-l  )*tau); 

freejdmatrix(xhat,0,Nf-T-(m-l)*tau-2*(m+l),Nf+T,Nf+Nt); 

freejdmatrix(e,0,Nf-T-(m-l)*tau-2*(m+l),Nf,Nf+Nt-T); 

fclose(fpl); 

fclose(fp2); 


C.2.4  Overlap  Prediction. 

/*  This  program,  casdaglil7.c,  uses  the  best  ra  and  k  as  found  from  previous 
runs  of  casdagli7.c  to  prepare  for  pred.  of  Nf+Nt+1  =  Nf+1  for  a  given 
value  of  T  =  tau  (a  common  best  compromise  m  and  k  are  used  throughout 
casdaglil7.c  and  casdaglilS.c).  Written  Jan  1994  by  Jim  Stright,  it  is 
derived  from  casdagli7.c.  Progam  casdaglil7  c  provides  as  outputs  the 
files  tauxdata  containing  the  right  endpoints  of  the  k  ngbrs  nearest 
the  point  Nf+Nt+1 -taux,  where  taux  is  the  value  of  tau  =  T  used  and  "x" 
is  replaced  with  1  and  2,  corresponding  to  short  and  long  delays  tau. 

Many  of  the  subroutines  are  taken  from  the  book  by  Press  et  al, 

"Numerical  Recipes  in  C."  Two  runs  of  this  program  produce  the  files 
tauldata  and  tau2data  and  constitute  the  first  part  of  implementing 
"overlap  predictions."  The  file  tauldata  is  pruned  against  tau2data 
using  the  program  "casdaglilS.c."  Casdaglil7.c  is  the  first  of  three 
programs  used  to  implement  "overlap  prediction;"  after  casdaglilS.c  is 
executed,  the  actual  predicted  value  is  obtained  using  casdaglil9.c.  */ 
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#mcliide  <stdio.h> 

#mclude  <stdlib.h> 

#mclude  <inath.h> 
double  *dvector(); 
double  ♦♦dmatrixO; 
double  sort2(); 

void  freejdvector(),freejdmatrix(); 

void  inain(void) 

{ 

FILE  •fpl,  *fp2;  /*  fpl  is  tsdata  (input);  fp2  is  tauxdata  */ 

I*  CHANGE  NAME  TAUXDATA  BELOW  WITH  EACH  CHANGE  IN  T  =  TAU  */ 
int  m  =  14;  /* 

intk=158;  /* 

int  Nf=  2670;  /* 

intNt  =  0;  /* 

intT  =  4;  /* 

int  tau  =  T,  /* 

int  i;  /* 

intj,ctrl,ctr2,ctr3;  /* 

double  *x; 
double  *dhold; 
double  **d; 

X  =  dvector(l,Nf+Nt); 
dhold  =  dvector(l,Nf+Nt-m*tau); 
d  =  dinatrix(Nf+Nt+l-tau,Nf+Nt+l-tau,l,Nf+Nt-m*tau); 

/♦  d[i][l]  is  the  distance  from  vector  x[i]  to  vector  x[l+(m-l )*tau]; 
d[i][2]  is  the  distance  from  vector  x[i]  to  vector  x[l+(m-l)*tau+l]; 

d[i][Nf+Nt-m*tau]  is  distance  from  vctr  x[i]  to  vctr  xINf+Nt-tau], 
before  a  swap  for  nearness  is  performed.  *! 

I*  open  tsdata  for  input  */ 
if  ((fpl  =  fopen("tsdata","r"))  =  NULL)  { 
printf("Cannot  open  file  tsdataXn"); 
exit(l); 

} 

I*  open  tauxdata  for  output  */ 


embedding  dim;  from  casdagliV.c’s  output  *! 
from  casdagli7.c’s  output;  compromise  kbest  */ 
nbr  of  time  series  values  in  fitting  set  */ 
nbr  of  ts  vals  in  testing  set;  fixed  at  0  */ 

T  =  tau;  eg,  1st  taul  =  4,  then  tau2  =  7  */ 
delay  time;  must  equal  T  now  */ 
same  as  Nf  +  Nt  +  1  -  tau  */ 
counters  */ 
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if  ((fp2  =  fopeii("tauldata'',  "w"))  =  NULL)  { 
printf("Cannot  open  file  tauxdataXn**); 
exit(l); 


/*  read  in  the  time  series  data  */ 
for  (ctrl=l;  ctrl<=Nf+Nt;  ctrl++)  { 
fscanf(fpl,  &x[ctrl]); 

} 

/*  compute  distances  d[i]|j]  and  load  d  matrix  with  nearness  indices  */ 
i  =  Nf  +  Nt+l-tau; 

for  (j=l;  j<=Nf-T-(m-l)*tau;  j++)  {  /*  see  indexing  note  below  */ 
d[iHj]  =  fabs(x[i]  -  x|j+(m-l)*taul); 
for  (ctrl=tau;  ctrl<=(m-l)*tau;  ctrl=ctrl+tau)  { 
if  (fabs(x[i-ctrl]-xlj+(m-l)*tau-ctrl])  >  d[i]|j])  { 
d[i][j]  =  fabs(x[i-ctrl]-x|j+(m-l)*tau-ctrl]); 

} 

} 

/*  dist  d[i]|j]  between  vctrs  x[i]  &  x[j+(m-l)*tau]  is  fixed  */ 

} 

f*  the  distances  d[i](j]  are  now  established  for  all  j  */ 

t*  initialize  the  index-swap  vector  dhold  */ 
for  (ctr2=l;  ctr2<=Nf-T-(m-l)*tau;  ctr2++)  { 
dhold[ctr2]  =  ctr2  +  (m-l)*tau; 

/*  now  the  contents  of  dhold[l],  eg,  is  l+(m-l)*tau  */ 

} 

/*  Sort  the  contents  of  the  vector  d[i]  and  simultaneously  sort  the 
vector  dhold  into  ascending  order  of  nearness  of  vectors  to  x[i]; 
see  Press  Ed  2,  page  334.  */ 
sort2(Nf-T-(m-l)*tau,  d[i],  dhold); 

/*  replace  contents  of  vector  d[i]  with  indices  of  vectors  arranged 
in  ascending  order  of  nearness  to  x[i]  */ 
for  (ctr3=l;  ctr3<=Nf-T-(m-l)*tau;  ctr3++)  { 
d[il[ctr31  =  dhold[ctr3]; 

} 

/*  Now  the  contents  of  vector  d[i]  is  the  set  of  indices  of  vectors 
compared  for  nearness  to  vector  x[i],  arranged  in  ascending  order 
of  nearness  to  x[i];  eg,  d[i][l]  is  index  of  vctr  nearest  x[i].  */ 
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for(j=l;j<=k;j++)  { 

fprmtf(fp2,"%d\n",(mtXd[i][jl)); 

} 

fiee-dvector(x,  1 f+Nt); 

freejdvector(dhold,l,Nf+Nt-m*tau); 

freejdmatrix(d  J*If+Nt+l  -tau  J4f+Nt+ 1  -tau,  1  ,Nf+Nt-m*tau); 

fclose(fpl); 

fclose(fjp2); 


I*  This  program,  casdaglilS.c,  inputs  two  sets  of  right  endpoints  of  intervals 
corresponding  to  two  sets  of  nearest  ngbrs  for  two  different  values  of  tau, 
here  denoted  taul  and  tau2.  The  left  endpoints  corresponding  to  the  most 
distant  past  component  of  each  interval  are  computed  and  stored.  The 
resulting  sets  of  intervals  are  checked  for  overlap;  the  strategy  is  to 
retain  small  intervals  which  overlap  bigger  ones.  The  retained  right 
endpoints  are  output  to  be  used  for  prediction  in  program  casdaglil9.c. 
Written  Jan  1994  by  Jim  Stright,  program  casdaglilS.c  is  used  after  the 
program  casdaglil7.c  as  one  of  three  programs  which  together  implement 
"overlap  prediction."  It  uses  compromise  values  of  k  &  m  as  found 
from  previous  mns  of  casdagliT.c;  it  also  relies  on  casdagliT.c  for  the 
best  two  delays  taul  and  tau2  (tau2  >  taul)  to  use  for  prediction.  Some 
subroutines  are  taken  from  Press  et  al,  "Numerical  Recipes  in  C."  */ 

#include  <stdio.h> 

#include  <stdlib.h> 

#include  <math.h> 
int  ♦ivectorO; 
void  free  JvectorO; 

void  main(void) 

{ 

FILE  *fpl,  ♦fp2; 

FILE  *fp3; 
intm=  14; 
intk=  158; 
inttaul=4; 
int  tau2  =  7; 
int  a  =  0; 


/*  fpl  is  tauldata,  fp2  is  tau2data  (inputs)  */ 

/*  fp3  is  goodngbrs  (output)  */ 

/*  embedding  dim;  from  casdagli7.c’s  output  */ 
/*  from  casdagli7.c’s  output;  compromise  k  */ 
/*  delay  time  for  small  intervals  */ 

I*  delay  time  for  big  intervals  */ 

I*  nbr  small  intervals  retained;  initially  0  */ 
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inti=  1; 
intj  =  1; 
intq; 
int  *srep; 
int  •btep; 
int  *slep; 
int  *blep; 
int  ♦dn; 


step  =  ivector(l,k); 
btep  =  ivector(l,k); 
step  =  ivector(14c); 
btep  =  ivector(l,k); 
dn  =  ivector(l,k); 


/*  small  interval  counter  */ 

I*  big  interval  counter  */ 

/*  general  purpose  counter  */ 

/•  right  endpoints  of  input  small  intervals  */ 
/*  right  endpoints  of  input  big  intervals  */ 

!*  left  endpoints  of  input  small  intervals  */ 

/*  left  endpoints  of  input  big  intervals  */ 

/*  right  endpoints  of  output  small  intervals  *! 


I*  open  tauldata  for  input  */ 
if  ((fpl  =  fopen("tauldata”,"r"))  ==  NULL)  { 
printf("Car'’ot  open  file  tauldataXn"); 
exit(l); 


t*  open  tau2data  for  input  */ 
if  ((fp2  =  fopen("tau2data","r"))  ==  NULL)  { 
printf("Cannot  open  file  tau2data\n"); 
exit(l); 


/*  open  goodngbrs  for  output  */ 
if  ((fp3  =  fopenC'goodngbrs",  "w"))  =  NULL)  { 
printfC'Cannot  open  file  goodngbrs  \n’’); 
exit(l); 


/*  read  in  the  tauldata  *! 
for(q=l;q<=k;q++)  { 

fscanf(fpl,  "%d",  &srep[ql); 

} 


t*  read  in  the  tau2data  */ 
for(q=l;q<=k;  q++)  { 

fscanf(fp2,  "%d",  &brep[q]); 


} 


/*  set  the  small  left  endpoints  */ 
for(q=l;q<=k;  q++)  { 

slep[q]  =  siep[q]  -  (m-l)*taul; 

} 

I*  set  the  big  left  endpoints  */ 
for  (q=l;  q<=k;  q++)  { 

blep[q]  =  brep[q]  -  (m-l)*tau2; 

} 

while  (i  <=  k)  { 

/*  If  it  is  possible  to  reject  a  small  interval  and  still  obtain 
enough  small  itervals  for  linear  regression,  do  so  if  there 
is  any  overlap.  */ 

if(a  +  (k-i+l)>2*(m+l)){ 

while  (j  <=  k)  { I*  check  intervals  i  and  j  for  overlap  */ 
if  (srep[i]  <  bleplj])  { 

j=j+i; 

} 

else  { 

if  (slep[i]  <=  brepljl)  { 
a=  a+1; 
dn[a]  =  srep[i]; 
break; 

} 

elsej=j  +  1; 

} 

} 

j  =  i; 

} 

else  { 

for(q=i;q<=k;  q++)  { 
a  =  a+  1; 
dn[a]  =  srep[q]; 

} 

break; 

} 

i  =  i+  1; 

} 

/*  Now  dn[l],  dn[2], ...  ,dn[a]  are  all  of  the  retained  right  endpts.  *! 
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Iprmtf(flp3,"a=  %d\n",a); 


for(q=l;q<=a;q++)  { 
lprmtf(fp3,  "%d  \  n",dn[ql); 

} 

free  jvector(srep,  1  Jc); 
free  Jvector(bfep,  1  ^); 
free  JvectoF(slep,  1  ,k); 
free  Jvector(blep,  1  Jc); 
fireeJvector(do,lJc); 
fciose(^l); 
fclose(fp2); 
fclose(fp3); 


/*  This  program,  casdaglil9.c,  modifies  the  forecasting  algorithm  described 
on  p.  307  of  Casdagli’s  article  "Chaos  and  Deterministic  versus  Stochastic 
Non-linear  Modelling."  Written  Jan  1994  by  Jim  Stright,  it  is  also  a 
modification  of  casdagliT.c.  Casdaglil9.c  provides  a  prediction  of 
a  single  value  beyond  the  end  of  the  data  used  for  testing  .  It  does  so 
using  the  best  m  and  k  (kbest)  as  found  from  previous  runs  of  program 
casdagiiT.c.  Rather  than  use  all  kbest  nearest  nghbrs,  it  uses  only  those 
knext  of  them  derived  from  overlapping  intervals  from  the  best  2  tau  values 
(those  used  in  program  casdaglilS.c).  The  smaller  T=tau  (taul)  is  used; 
also  used  is  the  number  "a"  of  neighbors  which  is  taken  from  the  file 
"goodngbrs"  provided  by  program  casdaglilS.c.  Casdaglil9.c  is  the  final 
one  of  three  programs  used  to  implement  "overlap  prediction;"  the  order  of 
execution  is  casdt^lilT.c  (twice),  casdt^lilS.c,  then  casdaglil9.c.  Many 
of  the  subroutines  arc  taken  from  Press  et  al,  "Numerical  Recipes  in  C."*/ 

#include  <stdio.h> 

#include  <stdlib.h> 

#include  <math.h> 

#defineTINY1.0e-20 
#defineBIG1.0e20 
double  *dvector(); 
int  *ivector(); 
double  ••dmatrixO; 
double  momentO; 
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double  ludcmpO; 
double  lubksb(); 

void  fiteejdvectot(),freejdmatrix(),frce  JvectorO; 


void  maiQ(void) 

{ 

FILE  *fpl.  *^2; 
FILE  *fp3\ 
int  koriginal  =  158; 
inta=  151; 
int  111=14; 
int  Nf  =  2670; 
int  Nt  =  0; 
intT  =  4; 
int  tau  =  T; 
int  taul  =  P, 
int  tau2  =  7; 
int  lastpred  =  0; 
int  ij,ctrl,cti2,ctr3; 
int  row,col,l,q; 
int  k  =  a; 
int  kbest  =  a; 
int  knext  =  a; 
int  klast  =  0; 


/*  fpl  is  tsdata,  fp2  is  goodngbrs  (inputs)  *! 

I*  fp3  is  casdata  (output)  */ 

I*  original  nbr  of  intervals;  from  casdagUH.c  */ 

/*  nbr  kept  intrvls  (STRIPPED  FROM  GOODNGBRS!)  *! 
/*  embedding  dim;  from  casdagli7.c’s  output  */ 

I*  nbr  of  time  series  values  in  fitting  set  */ 

/*  nbr  of  ts  vals  in  testing  set;  fixed  at  0  */ 

/*  forecasting  time;  =  taul  of  casdaglilS.c  */ 

/*  delay  time;  must  equal  T  now  */ 

/*  the  smaller  of  the  two  tau  values  */ 

/*  larger  tau  value;  can  get  from  casdaglilS.c  */ 

/*  number  of  desired  predictions  minus  one  */ 

/*  counters  ♦/ 

/*  more  counters  */ 

/*  same  as  a  above  */ 

/♦  same  as  a  above  */ 
f*  same  as  a  above  */ 

/*  This  counter  is  at  the  nbr  (+2*(m+l ))  of  the 
last  nearest  neighbor  incorporated  in  the 
A  matrix  */ 


int  Ns  =  1; 
intn; 

int  FLAG  =  0; 
int  kexp  =  0; 
double  kbase  =  2.0; 
double  xhatn; 


/*  spacing  of  the  sampled  delay  vectors  */ 

/*  required  for  call  to  "moment"  */ 

/*  used  in  ludcmp  for  "too  large"  check  */ 

/*  counter  for  exponential  spacing  of  k’s  */ 

/*  base  for  exponential  spacing  of  k’s  */ 

/*  the  (repeatedly  replaced)  predicted  value  */ 
double  ave,adev,sigma,svar, skew, curt; 

/*  all  of  these  required  for  call  to  "moment", 
although  only  ave  and  sigma  are  used  in 
casdaglilP.c;  see  Press  Ed  2,  p.613  */ 


double  *ave_ptr^&ave,*adev_ptr=&adev,*sigma-ptr^&sigma; 
double  *svar4)tr=&svar,*skew_ptr^&skew,*curt^tr^&curt; 
double  *x,'*xlr,*er; 

double  *dadd;  /*  daddlj]  is  distance  from  (m-t-1  )-tuple  ending  at 

x|j+m*tau]  to  the  final  (m+l)-tuple  in  the  time  series  */ 
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double  •♦A,**Alud,**d,*alpha,*b,dnr; 

I*  Alud  is  repeatedly  destroyed  by  ludcmp,  dnr  is  Press’s  d,  p.46  */ 
double  *dhold;  /*  used  to  index  nearest  m-tuples  *! 
double  *dahold;  t*  used  to  index  nearest  (ni+l)-tuples  */ 
int  *indx; 
int  ‘abnested; 

int  *dn;  !*  used  to  index  nearest  nghbrs  close  to  lin.  reg.  line  */ 
int  *dim;  /*  used  to  index  nghbrs  a  little  farther  away  *! 
int  *gngbrs;  i*  the  nbr  of  nghbrs  used,  and  their  values  */ 

X  =  dvectorfl  J4f+Nt+(lastpred+l)*T); 

xlr  =  dvector(l  Jcbest); 

er  =  dvector(l,kbest); 

dadd  =  dvector(l-tau,Nf-T-m*tau); 

dhold  =  dvector(l,Nf-T-(m-l)*tau); 

dahold  =  dvector(l-tau,Nf-T-m*tau); 

indx  =  ivector(l,m+l); 

dn  =  ivector(l,kbest); 

dnn  =  ivector(l,kbest); 

gngbrs  =  ivector(l,a); 

A  =  dnaiatrix(l,ni-»-l,l,m+l); 

Alud  =  dinatrix(l,in+l,l,m+l); 
d  =  draatrix(Nf+Nt,Nf+Nt+lastpred*T,l,Nf-T-(m-l)*tau); 

/•  d[i][l]  is  the  distance  from  vector  xli]  to  vector  x[l+(m-l)*tau]; 
d[i][2]  is  the  distance  from  vector  x[i]  to  vector  x[l+(m-l)*tau+l]; 

d[i][Nf-T-(m-l)*tau]  is  distance  from  vector  x[i]  to  vector  x[Nf-T], 
before  a  swap  for  nearness  is  performed.  */ 
alpha  =  dvector(l,m+l); 
b  =  dvector(l,m+l); 

/♦  open  tsdata  for  input  •/ 
if  ((fpl  =  fopen("tsdata","r"))  =  NULL)  { 
printfCCannot  open  file  tsdataXn”); 
exit(l); 

} 

/*  open  goodngbrs  for  input  *! 
if  ((fp2  =  fopen("goodngbrs","r"))  =  NULL)  { 
printfCCannot  open  file  goodngbrs  \n"); 
exit(l); 

} 
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I*  open  casdata  for  output  */ 
if  ((fip3  =  fopenC'casdata",  "w"))  ==  NULL)  { 
printf("Cannot  open  file  casdataXn”); 
exit(l); 

} 

/*  read  in  the  time  series  data  *! 
for  (ctrl=l;  ctrl<=Nf+Nt+l;  ctrl++)  { 
fscanf(fpl,  "%ir,  &x[ctrl]); 

} 

I*  read  in  the  right  endpoints  of  neighboring  intervals  data  *! 
for  (ctrl=l;  ctrl<=a;  ctrl++)  { 

fscanf(^2,  "%d",  &gngbrs[ctrl]); 

} 

I*  Find  standard  dev  sigma  for  the  time  series;  see  Press,  page  613.  */ 
n  =  Nf+Nt; 

moment(x,n,ave_ptr,adev_ptr,sigma_ptr,svar4)tr,skew^tr,curt-ptr); 

fprintf(fp3,"Data  output  from  program  casdaglil9.c\n’’); 
fprintf(fp3,"m=%2d\n",m); 

fprintf(fp3,"using  taul=%d  and  tau2=%d\n",taul,tau2); 
fprintf(fp3,"oiiginal  nbr  nearest  nghbrs  =  %d\n",koriginal); 
fprintf(fp3,"nbr  nearest  nghbrs  retained  a  =  %3d\n",a); 
fprintf(fp3,"average  data  value  ave  =  %2.5f\n",ave); 
fprintf(fp3,"data  standard  deviation  sigma  =  %2.5f\n", sigma); 
fprintf(fp3,"ts  index  true  val  pred  val\n"); 

i=Nf+Nt+l-T;  I*  A  fixed  value  in  this  program  *! 

/*  Proceed  to  predict  from  the  m-tuple  ending  at  x[i],  using  as 
nearest  ngbrs  the  knext  ngbrs  nearest  to  x[i]  (with  components 
spaced  tau  units  apart).  That  is,  use  as  the  m-tuple  neighbors 
of  the  m-tuple  x[i],  the  m-tuples  with  indices  gngbrs[l], 
gngbrs[2], ... ,  gngbrs[knext].  */ 

!*  Establish  the  A  matrix  at  k=knext.  */ 
t*  First  the  diagonal  entries:  */ 

A[l][l]  =  knext; 

Alud[ll[ll  =  A[ll[ll; 
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for  (ctrl=2;  ctrl<=(m+l);  ctrl++)  { 

A[ctrll[ctrl]  =  0.0; 

for  (ctr2=l;  ctr2<=knext;  ctr2++)  { 

A[ctrl][ctrl]  =  A[ctrl][ctrl] 

+  x[gngbrs[cti2]-(ctrl-2)*taul 
*  x[gngbrs[ctr2]-(ctrl-2)*tau]; 

} 

Alud[ctrl](ctrl]  =  A[ctrl][ctrl]; 

} 

I*  Now  the  first  row  (and  first  column)  entries:  ♦/ 
for  (col=2;  col<=(m+l);  col++)  { 

A[l][col]  =  0.0; 

for  (1=1;  l<=knext;  1++)  { 

A[ll[col]  =  A[l][col]  +  x[gngbrs[lHcol-2)*taul; 

} 

Alud[l][col]  =  A[l][col]; 

A[col][l]  =  A[l][col]; 

Alud[col][l]  =  A(col][l]; 


/*  Now  initialize  the  off-diag,  off-first-row-or-col  entries:  */ 
for  (row=2;  row<=m;  row++)  { 

for  (col=row+l;  col<=(m+l);  col++)  { 

A[row][col]  =  0.0; 
for  (1=1;  l<=knext;  1++)  { 

A[row][col]  =  A[row][col] 

+  x[gngbrs[ll-(row-2)*tau] 

•  x[gngbrs[l]-(col-2)*tau]; 

} 

Alud[row][col]  =  A[row][coll; 

A(col][row]  =  A[row][col]; 

Alud[coll[row]  =  A[coll[row]; 

} 

} 

I*  And  last,  initialize  the  b  vector,  and  its  equal  alpha  vector; 
alpha  gets  replaced  with  the  right  solution  when 
A*alpha  =  b  is  solved  as  A*x=b=alpha;  see  Press,  page  44.*/ 
b[l]  =  0.0; 

for  (1=1 ;  l<=knext;  1++)  { 

b[l]  =  b[l]  +  x[gngbrs[l]+Tl; 
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} 

alphall]  =  b[l]; 

for  (row=2;  row<=(m+l);  row++)  { 
b[row]  =  0.0; 
for  (1=1;  l<=knext;  1++)  { 
b(rowl  =  b[rowl 

+  x[gngbrs[l]-(row-2)*tau] 

*  x[gngbrs[l]+T]; 

} 

alpha[row]  =  b[row]; 

} 

!*  Solve  the  normal  eqtns  for  alpha[l]  thru  alpha[m+l]  */ 
ludcmp(Alud,FLAG,m+l  ,indx,&dnr); 

if(FLAG=l){ 

alpha[l]  =  1000001; 

FLAG=0;} 
else  { 

lubksb(Alud,m-t- 1  ,indx,alpha); 

} 

I*  alpha! l],alpha[2],...  are  now  optimum  in  Casdagli’s  eqtn  5,  if 
the  normal  equations  admit  a  solution.  Otherwise  set  alpha!  1] 
=  x!i],  alpha!2]=alpha!3]=.,.=alpha!m+ll=0,  so  that 
xhatn  =  x!i],  the  previous  data  value.  */ 

for  (ctrl=l;  ctrl<=(m+l);  ctrl++)  { 
if  ((fabs(alpha!ctrl])  >  1000000)11 
(alphalctrl]  =  HUGE.VAL))  { 
alpha!l]  =  x!ij; 

for  (ctr2=2;  ctr2<=(m+l);ctr2++)  { 
alpha!ctr2]  =  0.0; 

} 

break; 

} 

} 

xhatn  =  alpha!!]; 

for  (ctrl=2;  ctrl<=(m+l );  ctrl++)  { 

xhatn  =  xhatn  +  alpha!ctrl]*x!i-rctrl-2)*tau]; 

} 
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/*  xhatn  has  now  been  established;  it  is  the  predicted  time  series 
value  at  time  i+T.  */ 

fprintf(fp3,"%5d  %f  %f\nM+T,xIi+T],xhatn); 

free  jdvector(x,  1  ,N  f+Nt+(lastpred+ 1  )*T); 

fteejdvector(xlr,l  .kbest); 

free  jdvector(er,  1  Jcbest); 

free  jdvectoifdadd,  1  -tau,Nf-T-m*tau); 

free  jdvectorfalpha,  1  ,m+ 1 ); 

fteejdvector(b,l  ,m+l); 

free  jdvectoifdhold,  1  ,N  f-T-(m- 1  )*  tau); 

freejdvectorfdahold,  1  -tau,Nf-T-m*tau); 

free  Jvector(indx,  1  ,m+ 1 ); 

free  Jvector(dn,  1  ,kbest); 

free  Jvector(dnn,  1  ,kbest); 

free  Jvectorfgngbrs,  1  ,a); 

free  jdmatrix(A,  1  ,m+ 1 , 1  ,m+ 1 ); 

free  jdraatrix(  Alud,  1  ,m+ 1 , 1  ,m+ 1 ); 

free  jdmatrix(d,Nf+Nt,Nf+Nt+lastpred*T,  1  ,Nf-T-(m- 1  )*tau); 

fclose(fpl); 

fclose(fp2); 

fclose(fp3); 


C.3  Subroutines 

double  moroent(data,n,ave,adev,sdev,svar,skew,curt) 
int  n; 

double  *data,*ave,*adev,*sdev,*svar,*skew,*curt; 

{ 

int  ij; 
double  s,p; 
void  nrerrorC); 

if  (n  <=  1)  nrerror("n  must  be  at  least  2  in  MOMENT"); 
s=0.0; 

for  (j=lu<=na++)  s  +=  datalj]; 

*ave=s/n; 

*adev=(*svar)=(*skew)=(*curt)=0.0; 
for(j=l  j<=n  j++)  { 
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♦adev  +=  fabs(s=datalj]-(*ave)); 

*svar  +=  (p=s*s); 

♦skew  +=  (p  *=  s); 

•curt  +=  (p  *=  s); 

} 

♦adev  /=  n; 

♦svar/=  (n-1); 

♦sdev=sqrt(^svar); 
if  (♦svar)  { 

♦skew  I-  (n^(^svar)^(^sdev)); 
♦curt=(^curt)/(n^(^svar)^(^svar))-3.0; 

}  else  nrerror("No  skew/kurtosis  when  variance  =  0  (in  MOMENT)"); 

} 

double  ludcinp(a,FLAG,n,indx4) 
int  FLAG,n,^indx; 
double  ♦♦a,^d; 

{ 

int  i,iinaxj,k; 
double  big,duin,sum,temp; 
double  ♦vv,^dvector(); 
void  nrcrror(),freejdvector(); 

vv=dvector(l,n); 

♦d=1.0; 

for(i=l;i<=n;i++)  { 
big=0.0; 

for  (j=lij<=nu++) 

if  ((temp=fabs(a[i]|jl))  >  big)  big=temp; 
if(big  =  0.0){ 

FLAG=1; 

return; 

} 

vv[i]=  1.0/big; 

} 

forO=lu<=ny++)  { 
for(i=l;i<j;i++)  { 
sum=a[il|j]; 

for  (k=l,k<idc++)  sum  -=  a[il[k]^a[k]|j]; 
a[iHjl=sum; 

} 

big=0.0; 
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for(i=j-4<=nu++)  { 
sum=a[i][j]; 
for(k=14c<j4c++) 

sum  -=  ali][k]*a[kl[j]; 
a[i]|j]=sum; 

if  ( (duiii=w[i]*fabs(sum))  >=  big)  { 
big=dum; 
imax=i; 

} 

} 

if  (j  !=  imax)  { 

for  (k=14c<=n4c++)  { 
dum=a[imax][k]; 
a(imax][k]=alj][k]; 
a|jl[k]=dum; 

} 

•d  =  -(*d); 
w[imax]=vvlj]; 

} 

indx|j]=imax; 

if(a(j]a]  =  0.0)aU][j]=TINY; 

ifO!=n){ 

dum=1.0/(a[j][j]); 

for  (i=j+l;i<=n;i++)  a[i][j]  *=  dum; 

} 

} 

free  jdvector(vv,  1  ,n); 


double  lubksb(a,n,indx,b) 
double  **a,b[]; 
int  n,*indx; 

{ 

int  i,ii=0,ipj; 
double  sum; 


for(i=l;i<=n;i++)  { 
ip=indx[i]; 
sum=b[ip]; 
b[ip]=b[i]; 

if(ii) 

for  (j=iia<=i-la++)  sum  -=  a[i]|jl*b|jl; 
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else  if  (sum)  ii=i; 
b[i]=sum; 

} 

for(i=ny>=lu-)  { 
sun^b[i]; 

for(j=i+la<=ny++)  sum  -=  a[i]|j]*b|jl; 
b[i]=sum/a[i][i]; 

} 

} 

double  sort2(n,ra.ib) 
intn; 

double  ra[],tb[]; 

{ 

int  lj,ir,i; 
double  nb,rra; 

l=(n»  1)+1; 
ir=n; 
for(;;){ 
if(l>l){ 
rra=ra[-l]; 
nb=rb(l]; 

}  else  { 
iTa^ra[ir]; 
nb=rblirl; 
ralir]=ratll; 
rt)lir]=ib[l]; 
if(-ir=l){ 
ra[l]=rra; 
rb(l]=rrb; 
return; 

} 

} 

i=i; 

j=l«  1; 
while  (j  <=  ir)  { 

if  (j  <  ir  &&  ra(jl  <  ra(j+I  ])  ++j; 
if  (rra<ra(j])  { 
ra[i]=ralj]; 
rb[i]=rb[j]; 
j  +=  (i=j); 
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} 

elsej=ir+l; 


} 

ra{i]=iTa; 

ib[il=rtb; 

} 

} 

void  nieiTor(em)r Jext) 
char  error-text[]; 

{ 

void  exitO; 


fprii)tf(stdeiT,  "Numerical  Recipes  run-time  error...  \n"); 
fprintf(stderr,"%s  \  n",error_text); 
fprintf(stderr,"...now  exiting  to  system...\n"); 
exit(l); 


int  *ivector(nl,nh) 
intnl,nh; 

{ 

int*v; 


v=(int  *)iiialloc((unsigned)  (nh-nl+ll^sizeofCint)); 
if  (!v)  nrerror("allocation  failure  in  ivector()"); 
return  v-nl; 


float  ♦vector(nl,nh) 
int  nl,ah; 

{ 

float  ♦v; 


v=(float  *)malloc((unsigned)  (nh-nl+l)*sizeof(float)); 
if  (!v)  nrerror("allocation  failure  in  vector()"); 
return  v-nl; 


double  *dvector(nl,nh) 
intnl,nh; 

{ 
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double  ♦v; 


v=(double  *)malloc((unsigned)  (nh-iil+l)*sizeof(double)); 
if  (!v)  iiierror("allocation  failure  in  dvectorf)"); 
return  v-nl; 


float  **inatrix(nrl,nrh,ncl,nch) 
int  nrl,nrh,ncl,nch; 

{ 

inti; 

float  **01; 


n^(float  **)  malloc((unsigned)  (nrh-nrl+l)*sizeof(float*)); 
if  (!m)  nrenorC'allocation  failure  1  in  matrixO”); 
m  -=  nrl; 


for(i=nrld<=nrhd++)  { 

m[i]=(float  ♦)  malloc((unsigned)  (nch-ncl+l)*sizeof(float)); 
if  (!m[i])  nrerror("allocation  failure  2  in  matrixO"); 
m[i]  -=  ncl; 

} 

return  m; 

} 

double  **dmatrix(nrl,nrh,ncl,nch) 
int  nrl,nrh,ncl,nch; 

{ 

inti; 

double  **m; 

m=(double  **)  malloc((unsigned)  (nrh-nrl+l)*sizeof(double*)); 
if  (!m)  nterror("allocation  failure  1  in  dmatrixO"); 
m  -=  nrl; 

for(i=nrld<=nrh;i++)  { 

m[iMdouble  *)  iiialloc((unsigned)  (nch-ncl+l)*sizeof(double)); 
if  (!m[i])  nreiTor("allocation  failure  2  in  dmatrixO"); 
m[i]  -=  ncl; 

} 

return  m; 

} 
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void  free  Jvector(v,iil,nh) 
int  •v.nl.nh; 

{ 

fitee((char*)  (v+nl)); 

} 

void  free-vector(v,nl,nh) 
float  ♦v; 
int  nl,nh; 

{ 

free((char*)  (v+nl)); 

} 

void  freejdvector(v,nl,nh) 
double  ♦v; 
int  nl,nh; 

{ 

ftee((char*)  (v+nl)); 

} 

void  free-niatrix(m,nrl,nrh,nclnch) 

float  **m; 

int  nrl,nrh,ncl,nch; 

{ 

int  i; 

for(i=nrh;i>=nrl;i-)  free((char*)  (m[i]+ncl)); 
free((char*)  (m+nrl)); 

} 

void  fieejdinatrix(m,nrl,nrh,ncl,nch) 

double  **01; 

int  nrl,nrii,ncl,nch; 

{ 

inti; 

for(i=nrh;i>=nrl;i-)  free((char*)  (ra[il+ncl)); 
fitee((char*)  (m+nrl)); 

} 
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